Start over

Scala machine learning projects : build real-world machine learning and deep learning projects with Scala / Md. Rezaul Karim.

Author

Karim, Md. Rezaul [Browse]

Format

Book

Language

English

Εdition

1st edition

Published/Created

Birmingham, England ; Mumbai, [India] : Packt, 2018.
2018

Description

1 online resource (470 pages)

Available Online

Details

Subject(s)

Summary note

Powerful smart applications using deep learning algorithms to dominate numerical computing, deep learning, and functional programming. About This Book Explore machine learning techniques with prominent open source Scala libraries such as Spark ML, H2O, MXNet, Zeppelin, and DeepLearning4j Solve real-world machine learning problems by delving complex numerical computing with Scala functional programming in a scalable and faster way Cover all key aspects such as collection, storing, processing, analyzing, and evaluation required to build and deploy machine models on computing clusters using Scala Play framework. Who This Book Is For If you want to leverage the power of both Scala and Spark to make sense of Big Data, then this book is for you. If you are well versed with machine learning concepts and wants to expand your knowledge by delving into the practical implementation using the power of Scala, then this book is what you need! Strong understanding of Scala Programming language is recommended. Basic familiarity with machine Learning techniques will be more helpful. What You Will Learn Apply advanced regression techniques to boost the performance of predictive models Use different classification algorithms for business analytics Generate trading strategies for Bitcoin and stock trading using ensemble techniques Train Deep Neural Networks (DNN) using H2O and Spark ML Utilize NLP to build scalable machine learning models Learn how to apply reinforcement learning algorithms such as Q-learning for developing ML application Learn how to use autoencoders to develop a fraud detection application Implement LSTM and CNN models using DeepLearning4j and MXNet In Detail Machine learning has had a huge impact on academia and industry by turning data into actionable information. Scala has seen a steady rise in adoption over the past few years, especially in the fields of data science and analytics. This book is for data scientists, data engineers, and deep learning enthusiasts who have a background in complex numerical computing and want to know more hands-on machine learning application development. If you're well versed in machine learning concepts and want to expand your knowledge by delving into the practical implementation of these concepts using the power of Scala, then this book is what you need! Through 11 end-to-end projects, you will be acquainted with popular machine learning libraries such as Spark ML, H2O, DeepLearning4j, and MXNet. At the end,...

Notes

Includes index.

Source of description

Description based on online resource; title from PDF title page (EBC, viewed March 6, 2018).

Contents

Cover
Copyright and Credits
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: Analyzing Insurance Severity Claims
Machine learning and learning workflow
Typical machine learning workflow
Hyperparameter tuning and cross-validation
Analyzing and predicting insurance severity claims
Motivation
Description of the dataset
Exploratory analysis of the dataset
Data preprocessing
LR for predicting insurance severity claims
Developing insurance severity claims predictive model using LR
GBT regressor for predicting insurance severity claims
Boosting the performance using random forest regressor
Random Forest for classification and regression
Comparative analysis and model deployment
Spark-based model deployment for large-scale dataset
Summary
Chapter 2: Analyzing and Predicting Telecommunication Churn
Why do we perform churn analysis, and how do we do it?
Developing a churn analytics pipeline
Exploratory analysis and feature engineering
LR for churn prediction
SVM for churn prediction
DTs for churn prediction
Random Forest for churn prediction
Selecting the best model for deployment
Chapter 3: High Frequency Bitcoin Price Prediction from Historical and Live Data
Bitcoin, cryptocurrency, and online trading
State-of-the-art automated trading of Bitcoin
Training
Prediction
High-level data pipeline of the prototype
Historical and live-price data collection
Historical data collection
Transformation of historical data into a time series
Assumptions and design choices
Real-time data through the Cryptocompare API
Model training for prediction
Scala Play web service
Concurrency through Akka actors
Web service workflow
JobModule
Scheduler.
SchedulerActor
PredictionActor and the prediction step
TraderActor
Predicting prices and evaluating the model
Demo prediction using Scala Play framework
Why RESTful architecture?
Project structure
Running the Scala Play web app
Chapter 4: Population-Scale Clustering and Ethnicity Prediction
Population scale clustering and geographic ethnicity
Machine learning for genetic variants
1000 Genomes Projects dataset description
Algorithms, tools, and techniques
H2O and Sparkling water
ADAM for large-scale genomics data processing
Unsupervised machine learning
Population genomics and clustering
How does K-means work?
DNNs for geographic ethnicity prediction
Configuring programming environment
Data pre-processing and feature engineering
Model training and hyperparameter tuning
Spark-based K-means for population-scale clustering
Determining the number of optimal clusters
Using H2O for ethnicity prediction
Using random forest for ethnicity prediction
Chapter 5: Topic Modeling - A Better Insight into Large-Scale Texts
Topic modeling and text clustering
How does LDA algorithm work?
Topic modeling with Spark MLlib and Stanford NLP
Implementation
Step 1 - Creating a Spark session
Step 2 - Creating vocabulary and tokens count to train the LDA after text pre-processing
Step 3 - Instantiate the LDA model before training
Step 4 - Set the NLP optimizer
Step 5 - Training the LDA model
Step 6 - Prepare the topics of interest
Step 7 - Topic modelling
Step 8 - Measuring the likelihood of two documents
Other topic models versus the scalability of LDA
Deploying the trained LDA model
Chapter 6: Developing Model-based Movie Recommendation Engines
Recommendation system
Collaborative filtering approaches.
Content-based filtering approaches
Hybrid recommender systems
Model-based collaborative filtering
The utility matrix
Spark-based movie recommendation systems
Item-based collaborative filtering for movie similarity
Step 1 - Importing necessary libraries and creating a Spark session
Step 2 - Reading and parsing the dataset
Step 3 - Computing similarity
Step 4 - Testing the model
Model-based recommendation with Spark
Data exploration
Movie recommendation using ALS
Step 1 - Import packages, load, parse, and explore the movie and rating dataset
Step 2 - Register both DataFrames as temp tables to make querying easier
Step 3 - Explore and query for related statistics
Step 4 - Prepare training and test rating data and check the counts
Step 5 - Prepare the data for building the recommendation model using ALS
Step 6 - Build an ALS user product matrix
Step 7 - Making predictions
Step 8 - Evaluating the model
Selecting and deploying the best model
Chapter 7: Options Trading Using Q-learning and Scala Play Framework
Reinforcement versus supervised and unsupervised learning
Using RL
Notation, policy, and utility in RL
Policy
Utility
A simple Q-learning implementation
Components of the Q-learning algorithm
States and actions in QLearning
The search space
The policy and action-value
QLearning model creation and training
QLearning model validation
Making predictions using the trained model
Developing an options trading web app using Q-learning
Problem description
Implementating an options trading web application
Creating an option property
Creating an option model
Putting it altogether
Evaluating the model
Wrapping up the options trading app as a Scala web app
The backend
The frontend
Running and Deployment Instructions.
Model deployment
Clients Chapter 8: Subscription Assessment for Bank Telemarketing using Deep Neural Networks
Client subscription assessment through telemarketing
Dataset description
Installing and getting started with Apache Zeppelin
Building from the source
Starting and stopping Apache Zeppelin
Creating notebooks
Label distribution
Job distribution
Marital distribution
Education distribution
Default distribution
Housing distribution
Loan distribution
Contact distribution
Month distribution
Day distribution
Previous outcome distribution
Age feature
Duration distribution
Campaign distribution
Pdays distribution
Previous distribution
emp_var_rate distributions
cons_price_idx features
cons_conf_idx distribution
Euribor3m distribution
nr_employed distribution
Statistics of numeric features
Implementing a client subscription assessment model
Hyperparameter tuning and feature selection
Number of hidden layers
Number of neurons per hidden layer
Activation functions
Weight and bias initialization
Regularization
Chapter 9: Fraud Analytics Using Autoencoders and Anomaly Detection
Outlier and anomaly detection
Autoencoders and unsupervised learning
Working principles of an autoencoder
Efficient data representation with autoencoders
Developing a fraud analytics model
Description of the dataset and using linear models
Preparing programming environment
Step 1 - Loading required packages and libraries
Step 2 - Creating a Spark session and importing implicits
Step 3 - Loading and parsing input data
Step 4 - Exploratory analysis of the input data
Step 5 - Preparing the H2O DataFrame
Step 6 - Unsupervised pre-training using autoencoder.
Step 7 - Dimensionality reduction with hidden layers
Step 8 - Anomaly detection
Step 9 - Pre-trained supervised model
Step 10 - Model evaluation on the highly-imbalanced data
Step 11 - Stopping the Spark session and H2O context
Auxiliary classes and methods
Chapter 10: Human Activity Recognition using Recurrent Neural Networks
Working with RNNs
Contextual information and the architecture of RNNs
RNN and the long-term dependency problem
LSTM networks
Human activity recognition using the LSTM model
Setting and configuring MXNet for Scala
Implementing an LSTM model for HAR
Step 1 - Importing necessary libraries and packages
Step 2 - Creating MXNet context
Step 3 - Loading and parsing the training and test set
Step 4 - Exploratory analysis of the dataset
Step 5 - Defining internal RNN structure and LSTM hyperparameters
Step 6 - LSTM network construction
Step 7 - Setting up an optimizer
Step 8 - Training the LSTM network
Step 9 - Evaluating the model
Tuning LSTM hyperparameters and GRU
Chapter 11: Image Classification using Convolutional Neural Networks
Image classification and drawbacks of DNNs
CNN architecture
Convolutional operations
Pooling layer and padding operations
Subsampling operations
Convolutional and subsampling operations in DL4j
Configuring DL4j, ND4s, and ND4j
Large-scale image classification using CNN
Description of the image dataset
Workflow of the overall project
Implementing CNNs for image classification
Image processing
Extracting image metadata
Image feature extraction
Preparing the ND4j dataset
Training the CNNs and saving the trained models.
Evaluating the model.

OCLC

1022793255

Statement on language in description

Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...

Other views: Staff view

Princeton University Library Catalog

Scala machine learning projects : build real-world machine learning and deep learning projects with Scala / Md. Rezaul Karim.

Availability

Available Online

Details

Supplementary Information