Scala machine learning projects : build real-world machine learning and deep learning projects with Scala / Md. Rezaul Karim.

Author
Karim, Md. Rezaul [Browse]
Format
Book
Language
English
Εdition
1st edition
Published/​Created
  • Birmingham, England ; Mumbai, [India] : Packt, 2018.
  • 2018
Description
1 online resource (470 pages)

Details

Subject(s)
Summary note
Powerful smart applications using deep learning algorithms to dominate numerical computing, deep learning, and functional programming. About This Book Explore machine learning techniques with prominent open source Scala libraries such as Spark ML, H2O, MXNet, Zeppelin, and DeepLearning4j Solve real-world machine learning problems by delving complex numerical computing with Scala functional programming in a scalable and faster way Cover all key aspects such as collection, storing, processing, analyzing, and evaluation required to build and deploy machine models on computing clusters using Scala Play framework. Who This Book Is For If you want to leverage the power of both Scala and Spark to make sense of Big Data, then this book is for you. If you are well versed with machine learning concepts and wants to expand your knowledge by delving into the practical implementation using the power of Scala, then this book is what you need! Strong understanding of Scala Programming language is recommended. Basic familiarity with machine Learning techniques will be more helpful. What You Will Learn Apply advanced regression techniques to boost the performance of predictive models Use different classification algorithms for business analytics Generate trading strategies for Bitcoin and stock trading using ensemble techniques Train Deep Neural Networks (DNN) using H2O and Spark ML Utilize NLP to build scalable machine learning models Learn how to apply reinforcement learning algorithms such as Q-learning for developing ML application Learn how to use autoencoders to develop a fraud detection application Implement LSTM and CNN models using DeepLearning4j and MXNet In Detail Machine learning has had a huge impact on academia and industry by turning data into actionable information. Scala has seen a steady rise in adoption over the past few years, especially in the fields of data science and analytics. This book is for data scientists, data engineers, and deep learning enthusiasts who have a background in complex numerical computing and want to know more hands-on machine learning application development. If you're well versed in machine learning concepts and want to expand your knowledge by delving into the practical implementation of these concepts using the power of Scala, then this book is what you need! Through 11 end-to-end projects, you will be acquainted with popular machine learning libraries such as Spark ML, H2O, DeepLearning4j, and MXNet. At the end,...
Notes
Includes index.
Source of description
Description based on online resource; title from PDF title page (EBC, viewed March 6, 2018).
Contents
  • Cover
  • Copyright and Credits
  • Packt Upsell
  • Contributors
  • Table of Contents
  • Preface
  • Chapter 1: Analyzing Insurance Severity Claims
  • Machine learning and learning workflow
  • Typical machine learning workflow
  • Hyperparameter tuning and cross-validation
  • Analyzing and predicting insurance severity claims
  • Motivation
  • Description of the dataset
  • Exploratory analysis of the dataset
  • Data preprocessing
  • LR for predicting insurance severity claims
  • Developing insurance severity claims predictive model using LR
  • GBT regressor for predicting insurance severity claims
  • Boosting the performance using random forest regressor
  • Random Forest for classification and regression
  • Comparative analysis and model deployment
  • Spark-based model deployment for large-scale dataset
  • Summary
  • Chapter 2: Analyzing and Predicting Telecommunication Churn
  • Why do we perform churn analysis, and how do we do it?
  • Developing a churn analytics pipeline
  • Exploratory analysis and feature engineering
  • LR for churn prediction
  • SVM for churn prediction
  • DTs for churn prediction
  • Random Forest for churn prediction
  • Selecting the best model for deployment
  • Chapter 3: High Frequency Bitcoin Price Prediction from Historical and Live Data
  • Bitcoin, cryptocurrency, and online trading
  • State-of-the-art automated trading of Bitcoin
  • Training
  • Prediction
  • High-level data pipeline of the prototype
  • Historical and live-price data collection
  • Historical data collection
  • Transformation of historical data into a time series
  • Assumptions and design choices
  • Real-time data through the Cryptocompare API
  • Model training for prediction
  • Scala Play web service
  • Concurrency through Akka actors
  • Web service workflow
  • JobModule
  • Scheduler.
  • SchedulerActor
  • PredictionActor and the prediction step
  • TraderActor
  • Predicting prices and evaluating the model
  • Demo prediction using Scala Play framework
  • Why RESTful architecture?
  • Project structure
  • Running the Scala Play web app
  • Chapter 4: Population-Scale Clustering and Ethnicity Prediction
  • Population scale clustering and geographic ethnicity
  • Machine learning for genetic variants
  • 1000 Genomes Projects dataset description
  • Algorithms, tools, and techniques
  • H2O and Sparkling water
  • ADAM for large-scale genomics data processing
  • Unsupervised machine learning
  • Population genomics and clustering
  • How does K-means work?
  • DNNs for geographic ethnicity prediction
  • Configuring programming environment
  • Data pre-processing and feature engineering
  • Model training and hyperparameter tuning
  • Spark-based K-means for population-scale clustering
  • Determining the number of optimal clusters
  • Using H2O for ethnicity prediction
  • Using random forest for ethnicity prediction
  • Chapter 5: Topic Modeling - A Better Insight into Large-Scale Texts
  • Topic modeling and text clustering
  • How does LDA algorithm work?
  • Topic modeling with Spark MLlib and Stanford NLP
  • Implementation
  • Step 1 - Creating a Spark session
  • Step 2 - Creating vocabulary and tokens count to train the LDA after text pre-processing
  • Step 3 - Instantiate the LDA model before training
  • Step 4 - Set the NLP optimizer
  • Step 5 - Training the LDA model
  • Step 6 - Prepare the topics of interest
  • Step 7 - Topic modelling
  • Step 8 - Measuring the likelihood of two documents
  • Other topic models versus the scalability of LDA
  • Deploying the trained LDA model
  • Chapter 6: Developing Model-based Movie Recommendation Engines
  • Recommendation system
  • Collaborative filtering approaches.
  • Content-based filtering approaches
  • Hybrid recommender systems
  • Model-based collaborative filtering
  • The utility matrix
  • Spark-based movie recommendation systems
  • Item-based collaborative filtering for movie similarity
  • Step 1 - Importing necessary libraries and creating a Spark session
  • Step 2 - Reading and parsing the dataset
  • Step 3 - Computing similarity
  • Step 4 - Testing the model
  • Model-based recommendation with Spark
  • Data exploration
  • Movie recommendation using ALS
  • Step 1 - Import packages, load, parse, and explore the movie and rating dataset
  • Step 2 - Register both DataFrames as temp tables to make querying easier
  • Step 3 - Explore and query for related statistics
  • Step 4 - Prepare training and test rating data and check the counts
  • Step 5 - Prepare the data for building the recommendation model using ALS
  • Step 6 - Build an ALS user product matrix
  • Step 7 - Making predictions
  • Step 8 - Evaluating the model
  • Selecting and deploying the best model
  • Chapter 7: Options Trading Using Q-learning and Scala Play Framework
  • Reinforcement versus supervised and unsupervised learning
  • Using RL
  • Notation, policy, and utility in RL
  • Policy
  • Utility
  • A simple Q-learning implementation
  • Components of the Q-learning algorithm
  • States and actions in QLearning
  • The search space
  • The policy and action-value
  • QLearning model creation and training
  • QLearning model validation
  • Making predictions using the trained model
  • Developing an options trading web app using Q-learning
  • Problem description
  • Implementating an options trading web application
  • Creating an option property
  • Creating an option model
  • Putting it altogether
  • Evaluating the model
  • Wrapping up the options trading app as a Scala web app
  • The backend
  • The frontend
  • Running and Deployment Instructions.
  • Model deployment
  • Clients Chapter 8: Subscription Assessment for Bank Telemarketing using Deep Neural Networks
  • Client subscription assessment through telemarketing
  • Dataset description
  • Installing and getting started with Apache Zeppelin
  • Building from the source
  • Starting and stopping Apache Zeppelin
  • Creating notebooks
  • Label distribution
  • Job distribution
  • Marital distribution
  • Education distribution
  • Default distribution
  • Housing distribution
  • Loan distribution
  • Contact distribution
  • Month distribution
  • Day distribution
  • Previous outcome distribution
  • Age feature
  • Duration distribution
  • Campaign distribution
  • Pdays distribution
  • Previous distribution
  • emp_var_rate distributions
  • cons_price_idx features
  • cons_conf_idx distribution
  • Euribor3m distribution
  • nr_employed distribution
  • Statistics of numeric features
  • Implementing a client subscription assessment model
  • Hyperparameter tuning and feature selection
  • Number of hidden layers
  • Number of neurons per hidden layer
  • Activation functions
  • Weight and bias initialization
  • Regularization
  • Chapter 9: Fraud Analytics Using Autoencoders and Anomaly Detection
  • Outlier and anomaly detection
  • Autoencoders and unsupervised learning
  • Working principles of an autoencoder
  • Efficient data representation with autoencoders
  • Developing a fraud analytics model
  • Description of the dataset and using linear models
  • Preparing programming environment
  • Step 1 - Loading required packages and libraries
  • Step 2 - Creating a Spark session and importing implicits
  • Step 3 - Loading and parsing input data
  • Step 4 - Exploratory analysis of the input data
  • Step 5 - Preparing the H2O DataFrame
  • Step 6 - Unsupervised pre-training using autoencoder.
  • Step 7 - Dimensionality reduction with hidden layers
  • Step 8 - Anomaly detection
  • Step 9 - Pre-trained supervised model
  • Step 10 - Model evaluation on the highly-imbalanced data
  • Step 11 - Stopping the Spark session and H2O context
  • Auxiliary classes and methods
  • Chapter 10: Human Activity Recognition using Recurrent Neural Networks
  • Working with RNNs
  • Contextual information and the architecture of RNNs
  • RNN and the long-term dependency problem
  • LSTM networks
  • Human activity recognition using the LSTM model
  • Setting and configuring MXNet for Scala
  • Implementing an LSTM model for HAR
  • Step 1 - Importing necessary libraries and packages
  • Step 2 - Creating MXNet context
  • Step 3 - Loading and parsing the training and test set
  • Step 4 - Exploratory analysis of the dataset
  • Step 5 - Defining internal RNN structure and LSTM hyperparameters
  • Step 6 - LSTM network construction
  • Step 7 - Setting up an optimizer
  • Step 8 - Training the LSTM network
  • Step 9 - Evaluating the model
  • Tuning LSTM hyperparameters and GRU
  • Chapter 11: Image Classification using Convolutional Neural Networks
  • Image classification and drawbacks of DNNs
  • CNN architecture
  • Convolutional operations
  • Pooling layer and padding operations
  • Subsampling operations
  • Convolutional and subsampling operations in DL4j
  • Configuring DL4j, ND4s, and ND4j
  • Large-scale image classification using CNN
  • Description of the image dataset
  • Workflow of the overall project
  • Implementing CNNs for image classification
  • Image processing
  • Extracting image metadata
  • Image feature extraction
  • Preparing the ND4j dataset
  • Training the CNNs and saving the trained models.
  • Evaluating the model.
OCLC
1022793255
Statement on language in description
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...
Other views
Staff view

Supplementary Information