The Statistics and Machine Learning with R Workshop : Unlock the Power of Efficient Data Science Modeling with This Hands-On Guide / Liu Peng.

Author
Peng, Liu [Browse]
Format
Book
Language
English
Εdition
First edition.
Published/​Created
  • Birmingham, England : Packt Publishing Ltd, [2023]
  • ©2023
Description
1 online resource (516 pages)

Details

Subject(s)
Summary note
Learn the fundamentals of statistics and machine learning using R libraries for data processing, visualization, model training, and statistical inference Key Features Advance your ML career with the help of detailed explanations, intuitive illustrations, and code examples Gain practical insights into the real-world applications of statistics and machine learning Explore the technicalities of statistics and machine learning for effective data presentation Purchase of the print or Kindle book includes a free PDF eBook Book Description The Statistics and Machine Learning with R Workshop is a comprehensive resource packed with insights into statistics and machine learning, along with a deep dive into R libraries. The learning experience is further enhanced by practical examples and hands-on exercises that provide explanations of key concepts. Starting with the fundamentals, you'll explore the complete model development process, covering everything from data pre-processing to model development. In addition to machine learning, you'll also delve into R's statistical capabilities, learning to manipulate various data types and tackle complex mathematical challenges from algebra and calculus to probability and Bayesian statistics. You'll discover linear regression techniques and more advanced statistical methodologies to hone your skills and advance your career. By the end of this book, you'll have a robust foundational understanding of statistics and machine learning. You'll also be proficient in using R's extensive libraries for tasks such as data processing and model training and be well-equipped to leverage the full potential of R in your future projects. What you will learn Hone your skills in different probability distributions and hypothesis testing Explore the fundamentals of linear algebra and calculus Master crucial statistics and machine learning concepts in theory and practice Discover essential data processing and visualization techniques Engage in interactive data analysis using R Use R to perform statistical modeling, including Bayesian and linear regression Who this book is for This book is for beginner to intermediate-level data scientists, undergraduate to masters-level students, and early to mid-senior data scientists or analysts looking to expand their knowledge of machine learning by exploring various R libraries. Basic knowledge of linear algebra and data modeling is a must.
Notes
Includes index.
Source of description
Description based on print version record.
Contents
  • Cover
  • Title Page
  • Copyright
  • Dedication
  • Contributors
  • Table of Contents
  • Preface
  • Part 1: Statistics Essentials
  • Chapter 1: Getting Started with R
  • Technical requirements
  • Introducing R
  • Covering the R and RStudio basics
  • Common data types in R
  • Common data structures in R
  • Vector
  • Matrix
  • Data frame
  • List
  • Control logic in R
  • Relational operators
  • Logical operators
  • Conditional statements
  • Loops
  • Exploring functions in R
  • Summary
  • Chapter 2: Data Processing with dplyr
  • Introducing tidyverse and dplyr
  • Data transformation with dplyr
  • Slicing the dataset using the filter() function
  • Sorting the dataset using the arrange() function
  • Adding or changing a column using the mutate() function
  • Selecting columns using the select() function
  • Selecting the top rows using the top_n() function
  • Combining the five verbs
  • Introducing other verbs
  • Data aggregation with dplyr
  • Counting observations using the count() function
  • Aggregating data via group_by() and summarize()
  • Data merging with dplyr
  • Case study - working with the Stack Overflow dataset
  • Chapter 3: Intermediate Data Processing
  • Transforming categorical and numeric variables
  • Recoding categorical variables
  • Creating variables using case_when()
  • Binning numeric variables using cut()
  • Reshaping the DataFrame
  • Converting from long format into wide format using spread()
  • Converting from wide format into long format using gather()
  • Manipulating string data
  • Creating strings
  • Converting numbers into strings
  • Connecting strings
  • Working with stringr
  • Basics of stringr
  • Pattern matching in a string
  • Splitting a string
  • Replacing a string
  • Putting it together
  • Introducing regular expressions
  • Working with tidy text mining.
  • Converting text into tidy data using unnest_tokens()
  • Working with a document-term matrix
  • Chapter 4: Data Visualization with ggplot2
  • Introducing ggplot2
  • Building a scatter plot
  • Understanding the grammar of graphics
  • Geometries in graphics
  • Understanding geometry in scatter plots
  • Introducing bar charts
  • Introducing line plots
  • Controlling themes in graphics
  • Adjusting themes
  • Exploring ggthemes
  • Chapter 5: Exploratory Data Analysis
  • EDA fundamentals
  • Analyzing categorical data
  • Summarizing categorical variables using counts
  • Converting counts into proportions
  • Marginal distribution and faceted bar charts
  • Analyzing numerical data
  • Visualization in higher dimensions
  • Measuring the central concentration
  • Measuring variability
  • Working with skewed distributions
  • EDA in practice
  • Obtaining the stock price data
  • Univariate analysis of individual stock prices
  • Correlation analysis
  • Chapter 6: Effective Reporting with R Markdown
  • Fundamentals of R Markdown
  • Getting started with R Markdown
  • Getting to know the YAML header
  • Formatting textual information
  • Writing R code
  • Generating a financial analysis report
  • Getting and displaying the data
  • Performing data analysis
  • Adding plots to the report
  • Adding tables to the report
  • Configuring code chunks
  • Customizing R Markdown reports
  • Adding a table of contents
  • Creating a report with parameters
  • Customizing the report style
  • Part 2: Fundamentals of Linear Algebra and Calculus in R
  • Chapter 7: Linear Algebra in R
  • Introducing linear algebra
  • Working with vectors
  • Working with matrices
  • Matrix vector multiplication
  • Matrix multiplication
  • The identity matrix.
  • Transposing a matrix
  • Inverting a matrix
  • Solving a system of linear equations
  • System of linear equations
  • The solution to matrix-vector equations
  • Geometric interpretation of solving a system of linear equations
  • Obtaining a unique solution to a system of linear equations
  • Overdetermined and underdetermined systems of linear equations
  • Chapter 8: Intermediate Linear Algebra in R
  • Introducing the matrix determinant
  • Interpreting the determinant
  • Connection to the matrix rank
  • Introducing the matrix trace
  • Special properties of the matrix trace
  • Understanding the matrix norm
  • Understanding the vector norm
  • Calculating the ​​L​​ 1​​-norm of a vector
  • Calculating the ​​L​​ 2​​-norm of a vector
  • Calculating the ​​L​​ ∞​​-norm of a vector
  • Calculating the ​​L​​ 1​​-norm of a matrix
  • Calculating the Frobenius norm of a matrix
  • Calculating the infinity norm of a matrix
  • Getting to know eigenvalues and eigenvectors
  • Understanding scalar-vector multiplication
  • Defining eigenvalues and eigenvectors
  • Computing eigenvalues and eigenvectors
  • Introducing principal component analysis
  • Understanding the variance-covariance matrix
  • Connecting to PCA
  • Performing PCA
  • Chapter 9: Calculus in R
  • Introducing calculus
  • Differential and integral calculus
  • More on functions
  • Vertical line test
  • Functional symmetry
  • Increasing and decreasing functions
  • Slope of a function
  • Function composition
  • Common functions
  • Understanding limits
  • Infinite limit
  • Limit at infinity
  • Introducing derivatives
  • Common derivatives
  • Common properties and rules of derivatives
  • Introducing integral calculus
  • Indefinite integrals
  • Indefinite integrals of basic functions.
  • Properties of indefinite integrals
  • Integration by parts
  • Definite integrals
  • Working with calculus in R
  • Plotting basic functions
  • Working with derivatives
  • Using symbolic parameters
  • Working with the second derivative
  • Working with partial derivatives
  • Working with integration in R
  • More on antiderivatives
  • Evaluating the definite integral
  • Part 3: Fundamentals of Mathematical Statistics in R
  • Chapter 10: Probability Basics
  • Introducing probability distribution
  • Exploring common discrete probability distributions
  • The Bernoulli distribution
  • The binomial distribution
  • The Poisson distribution
  • Poisson approximation to binomial distribution
  • The geometric distribution
  • Comparing different discrete probability distributions
  • Discovering common continuous probability distributions
  • The normal distribution
  • The exponential distribution
  • Uniform distribution
  • Generating normally distributed random samples
  • Understanding common sampling distributions
  • Common sampling distributions
  • Understanding order statistics
  • Extracting order statistics
  • Calculating the value at risk
  • Chapter 11: Statistical Estimation
  • Statistical inference for categorical data
  • Statistical inference for a single parameter
  • Introducing the General Social Survey dataset
  • Calculating the sample proportion
  • Calculating the confidence interval
  • Interpreting the confidence interval of the sample proportion
  • Hypothesis testing for the sample proportion
  • Inference for the difference in sample proportions
  • Type I and Type II errors
  • Testing the independence of two categorical variables
  • Introducing the contingency table
  • Applying the chi-square test for independence between two categorical variables
  • Statistical inference for numerical data.
  • Generating a bootstrap distribution for the median
  • Constructing the bootstrapped confidence interval
  • Re-centering a bootstrap distribution
  • Introducing the central limit theorem used in t-distribution
  • Constructing the confidence interval for the population mean using the t-distribution
  • Performing hypothesis testing for two means
  • Introducing ANOVA
  • Chapter 12: Linear Regression in R
  • Introducing linear regression
  • Understanding simple linear regression
  • Introducing multiple linear regression
  • Seeking a higher coefficient of determination
  • More on adjusted ​​R​​ 2​​
  • Developing an MLR model
  • Introducing Simpson's Paradox
  • Working with categorical variables
  • Introducing the interaction term
  • Handling nonlinear terms
  • More on the logarithmic transformation
  • Working with the closed-form solution
  • Dealing with multicollinearity
  • Dealing with heteroskedasticity
  • Introducing penalized linear regression
  • Working with ridge regression
  • Working with lasso regression
  • Chapter 13: Logistic Regression in R
  • Introducing logistic regression
  • Understanding the sigmoid function
  • Grokking the logistic regression model
  • Comparing logistic regression with linear regression
  • Making predictions using the logistic regression model
  • More on log odds and odds ratio
  • Introducing the cross-entropy loss
  • Evaluating a logistic regression model
  • Dealing with an imbalanced dataset
  • Penalized logistic regression
  • Extending to multi-class classification
  • Chapter 14: Bayesian Statistics
  • Introducing Bayesian statistics
  • A first look into the Bayesian theorem
  • Understanding the generative model
  • Understanding prior distributions
  • Introducing the likelihood function
  • Introducing the posterior model.
  • Diving deeper into Bayesian inference.
ISBN
  • 9781803237756
  • 1803237759
OCLC
1402162010
Statement on responsible collection description
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...
Other views
Staff view

Supplementary Information