Learn the fundamentals of statistics and machine learning using R libraries for data processing, visualization, model training, and statistical inference Key Features Advance your ML career with the help of detailed explanations, intuitive illustrations, and code examples Gain practical insights into the real-world applications of statistics and machine learning Explore the technicalities of statistics and machine learning for effective data presentation Purchase of the print or Kindle book includes a free PDF eBook Book Description The Statistics and Machine Learning with R Workshop is a comprehensive resource packed with insights into statistics and machine learning, along with a deep dive into R libraries. The learning experience is further enhanced by practical examples and hands-on exercises that provide explanations of key concepts. Starting with the fundamentals, you'll explore the complete model development process, covering everything from data pre-processing to model development. In addition to machine learning, you'll also delve into R's statistical capabilities, learning to manipulate various data types and tackle complex mathematical challenges from algebra and calculus to probability and Bayesian statistics. You'll discover linear regression techniques and more advanced statistical methodologies to hone your skills and advance your career. By the end of this book, you'll have a robust foundational understanding of statistics and machine learning. You'll also be proficient in using R's extensive libraries for tasks such as data processing and model training and be well-equipped to leverage the full potential of R in your future projects. What you will learn Hone your skills in different probability distributions and hypothesis testing Explore the fundamentals of linear algebra and calculus Master crucial statistics and machine learning concepts in theory and practice Discover essential data processing and visualization techniques Engage in interactive data analysis using R Use R to perform statistical modeling, including Bayesian and linear regression Who this book is for This book is for beginner to intermediate-level data scientists, undergraduate to masters-level students, and early to mid-senior data scientists or analysts looking to expand their knowledge of machine learning by exploring various R libraries. Basic knowledge of linear algebra and data modeling is a must.
Notes
Includes index.
Source of description
Description based on print version record.
Contents
Cover
Title Page
Copyright
Dedication
Contributors
Table of Contents
Preface
Part 1: Statistics Essentials
Chapter 1: Getting Started with R
Technical requirements
Introducing R
Covering the R and RStudio basics
Common data types in R
Common data structures in R
Vector
Matrix
Data frame
List
Control logic in R
Relational operators
Logical operators
Conditional statements
Loops
Exploring functions in R
Summary
Chapter 2: Data Processing with dplyr
Introducing tidyverse and dplyr
Data transformation with dplyr
Slicing the dataset using the filter() function
Sorting the dataset using the arrange() function
Adding or changing a column using the mutate() function
Selecting columns using the select() function
Selecting the top rows using the top_n() function
Combining the five verbs
Introducing other verbs
Data aggregation with dplyr
Counting observations using the count() function
Aggregating data via group_by() and summarize()
Data merging with dplyr
Case study - working with the Stack Overflow dataset
Chapter 3: Intermediate Data Processing
Transforming categorical and numeric variables
Recoding categorical variables
Creating variables using case_when()
Binning numeric variables using cut()
Reshaping the DataFrame
Converting from long format into wide format using spread()
Converting from wide format into long format using gather()
Manipulating string data
Creating strings
Converting numbers into strings
Connecting strings
Working with stringr
Basics of stringr
Pattern matching in a string
Splitting a string
Replacing a string
Putting it together
Introducing regular expressions
Working with tidy text mining.
Converting text into tidy data using unnest_tokens()
Working with a document-term matrix
Chapter 4: Data Visualization with ggplot2
Introducing ggplot2
Building a scatter plot
Understanding the grammar of graphics
Geometries in graphics
Understanding geometry in scatter plots
Introducing bar charts
Introducing line plots
Controlling themes in graphics
Adjusting themes
Exploring ggthemes
Chapter 5: Exploratory Data Analysis
EDA fundamentals
Analyzing categorical data
Summarizing categorical variables using counts
Converting counts into proportions
Marginal distribution and faceted bar charts
Analyzing numerical data
Visualization in higher dimensions
Measuring the central concentration
Measuring variability
Working with skewed distributions
EDA in practice
Obtaining the stock price data
Univariate analysis of individual stock prices
Correlation analysis
Chapter 6: Effective Reporting with R Markdown
Fundamentals of R Markdown
Getting started with R Markdown
Getting to know the YAML header
Formatting textual information
Writing R code
Generating a financial analysis report
Getting and displaying the data
Performing data analysis
Adding plots to the report
Adding tables to the report
Configuring code chunks
Customizing R Markdown reports
Adding a table of contents
Creating a report with parameters
Customizing the report style
Part 2: Fundamentals of Linear Algebra and Calculus in R
Chapter 7: Linear Algebra in R
Introducing linear algebra
Working with vectors
Working with matrices
Matrix vector multiplication
Matrix multiplication
The identity matrix.
Transposing a matrix
Inverting a matrix
Solving a system of linear equations
System of linear equations
The solution to matrix-vector equations
Geometric interpretation of solving a system of linear equations
Obtaining a unique solution to a system of linear equations
Overdetermined and underdetermined systems of linear equations
Chapter 8: Intermediate Linear Algebra in R
Introducing the matrix determinant
Interpreting the determinant
Connection to the matrix rank
Introducing the matrix trace
Special properties of the matrix trace
Understanding the matrix norm
Understanding the vector norm
Calculating the L 1-norm of a vector
Calculating the L 2-norm of a vector
Calculating the L ∞-norm of a vector
Calculating the L 1-norm of a matrix
Calculating the Frobenius norm of a matrix
Calculating the infinity norm of a matrix
Getting to know eigenvalues and eigenvectors
Understanding scalar-vector multiplication
Defining eigenvalues and eigenvectors
Computing eigenvalues and eigenvectors
Introducing principal component analysis
Understanding the variance-covariance matrix
Connecting to PCA
Performing PCA
Chapter 9: Calculus in R
Introducing calculus
Differential and integral calculus
More on functions
Vertical line test
Functional symmetry
Increasing and decreasing functions
Slope of a function
Function composition
Common functions
Understanding limits
Infinite limit
Limit at infinity
Introducing derivatives
Common derivatives
Common properties and rules of derivatives
Introducing integral calculus
Indefinite integrals
Indefinite integrals of basic functions.
Properties of indefinite integrals
Integration by parts
Definite integrals
Working with calculus in R
Plotting basic functions
Working with derivatives
Using symbolic parameters
Working with the second derivative
Working with partial derivatives
Working with integration in R
More on antiderivatives
Evaluating the definite integral
Part 3: Fundamentals of Mathematical Statistics in R
Chapter 10: Probability Basics
Introducing probability distribution
Exploring common discrete probability distributions
The Bernoulli distribution
The binomial distribution
The Poisson distribution
Poisson approximation to binomial distribution
The geometric distribution
Comparing different discrete probability distributions
Discovering common continuous probability distributions
The normal distribution
The exponential distribution
Uniform distribution
Generating normally distributed random samples
Understanding common sampling distributions
Common sampling distributions
Understanding order statistics
Extracting order statistics
Calculating the value at risk
Chapter 11: Statistical Estimation
Statistical inference for categorical data
Statistical inference for a single parameter
Introducing the General Social Survey dataset
Calculating the sample proportion
Calculating the confidence interval
Interpreting the confidence interval of the sample proportion
Hypothesis testing for the sample proportion
Inference for the difference in sample proportions
Type I and Type II errors
Testing the independence of two categorical variables
Introducing the contingency table
Applying the chi-square test for independence between two categorical variables
Statistical inference for numerical data.
Generating a bootstrap distribution for the median
Constructing the bootstrapped confidence interval
Re-centering a bootstrap distribution
Introducing the central limit theorem used in t-distribution
Constructing the confidence interval for the population mean using the t-distribution
Performing hypothesis testing for two means
Introducing ANOVA
Chapter 12: Linear Regression in R
Introducing linear regression
Understanding simple linear regression
Introducing multiple linear regression
Seeking a higher coefficient of determination
More on adjusted R 2
Developing an MLR model
Introducing Simpson's Paradox
Working with categorical variables
Introducing the interaction term
Handling nonlinear terms
More on the logarithmic transformation
Working with the closed-form solution
Dealing with multicollinearity
Dealing with heteroskedasticity
Introducing penalized linear regression
Working with ridge regression
Working with lasso regression
Chapter 13: Logistic Regression in R
Introducing logistic regression
Understanding the sigmoid function
Grokking the logistic regression model
Comparing logistic regression with linear regression
Making predictions using the logistic regression model
More on log odds and odds ratio
Introducing the cross-entropy loss
Evaluating a logistic regression model
Dealing with an imbalanced dataset
Penalized logistic regression
Extending to multi-class classification
Chapter 14: Bayesian Statistics
Introducing Bayesian statistics
A first look into the Bayesian theorem
Understanding the generative model
Understanding prior distributions
Introducing the likelihood function
Introducing the posterior model.
Diving deeper into Bayesian inference.
ISBN
9781803237756
1803237759
OCLC
1402162010
Statement on responsible collection description
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...