Bridge the gap between business and data science by learning how to interpret machine learning and AI models, manage data teams, and achieve impactful results Key Features Master the concepts of statistics and ML to interpret models and guide decisions Identify valuable AI use cases and manage data science projects from start to finish Empower top data science teams to solve complex problems and build AI products Purchase of the print Kindle book includes a free PDF eBook Book Description As data science and artificial intelligence (AI) become prevalent across industries, executives without formal education in statistics and machine learning, as well as data scientists moving into leadership roles, must learn how to make informed decisions about complex models and manage data teams. This book will elevate your leadership skills by guiding you through the core concepts of data science and AI. This comprehensive guide is designed to bridge the gap between business needs and technical solutions, empowering you to make informed decisions and drive measurable value within your organization. Through practical examples and clear explanations, you'll learn how to collect and analyze structured and unstructured data, build a strong foundation in statistics and machine learning, and evaluate models confidently. By recognizing common pitfalls and valuable use cases, you'll plan data science projects effectively, from the ground up to completion. Beyond technical aspects, this book provides tools to recruit top talent, manage high-performing teams, and stay up to date with industry advancements. By the end of this book, you'll be able to characterize the data within your organization and frame business problems as data science problems. What you will learn Discover how to interpret common statistical quantities and make data-driven decisions Explore ML concepts as well as techniques in supervised, unsupervised, and reinforcement learning Find out how to evaluate statistical and machine learning models Understand the data science lifecycle, from development to monitoring of models in production Know when to use ML, statistical modeling, or traditional BI methods Manage data teams and data science projects effectively Who this book is for This book is designed for executives who want to understand and apply data science methods to enhance decision-making. It is also for individuals who work with or manage data scientists and machine learning engineers, such as chief data officers (CDOs), data science managers, and technical project managers.
Bibliographic references
Includes bibliographical references and index.
Source of description
Description based on publisher supplied metadata and other sources.
Description based on print version record.
Contents
Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Part 1: Understanding Data Science and Its Foundations
Introducing Data Science
Data science, AI, and ML - what's the difference?
The mathematical and statistical underpinnings of data science
Statistics and data science
What is statistics?
Descriptive and inferential statistics
Sampling strategies
Probability
Probability distribution
Conditional probability
Describing our samples
Measures of central tendency
Measures of dispersion
Degrees of freedom
Correlation, causation, and covariance
The shape of data
Probability distributions
Discrete probability distributions
Continuous probability distributions
Summary
Characterizing and Collecting Data
What are the key criteria to consider when evaluating datasets?
Data quantity
Data velocity
Data variety
Data quality
First-, second-, and third-party data
First-party data - the treasure trove within
Second-party data - building bridges through collaboration
Third-party data - broadening horizons with external expertise
Structured, unstructured, and semi-structured data
Structured data
Unstructured data
Semi-structured data
Methods for collecting data
Storing and processing data
Cloud, on-premises, and hybrid solutions - navigating the data storage and analysis landscape
Cloud computing - scalable services in the cloud
On-premises - maintaining control within your walls
Hybrid - the best of both worlds?
Data processing
Exploratory Data Analysis
Getting started with Google Colab
What is Google Colab?
A step-by-step guide to setting up Google Colab
Understanding the data you have
EDA techniques and tools
Descriptive statistics
Data visualization.
Histograms
Density curves
Boxplots
Heatmaps
Dimensionality reduction
Correlation analysis
Outlier detection
The Significance of Significance
The idea of testing hypotheses
What is a hypothesis?
How does hypothesis testing work?
Formulating null and alternative hypotheses
Determining the significance level
Understanding errors
Getting to grips with p-values
Significance tests for a population proportion - making informed decisions about proportions
The z-test - comparing a sample proportion to a population proportion
Z-test example made easy
Significance tests for a population average (mean)
Writing hypotheses for a significance test about a mean
Conditions for a t-test about a mean
When to use z or t statistics in significance tests
Example - calculating the t-statistic for a test about a mean
Using a table to estimate the p-value from the t-statistic
Comparing the p-value from the t-statistic to the significance level
One-tailed and two-tailed tests
Walking through a case study
Understanding Regression
How can I benefit from understanding regression?
Introduction to trend lines
Fitting a trend line to data
Estimating the line of best fit
Calculating the equations of the lines of best fit
Interpreting the slope of a regression line
Interpreting the intercept of a regression line
Understanding residuals
Evaluating the goodness of fit in least-squares regression
Part 2: Machine Learning - Concepts, Applications, and Pitfalls
Introducing Machine Learning
From statistics to machine learning
What is machine learning?
How does machine learning relate to statistics?
Why is machine learning important?
Customer personalization and segmentation
Fraud detection and security.
Supply chain and inventory optimization
Predictive maintenance
Healthcare diagnostics and treatment
The different types of machine learning
Supervised learning
Unsupervised learning
Semi-supervised learning
Reinforcement learning
Transfer learning
Popular machine learning algorithms
Linear regression
Logistic regression
Decision trees
Random forests
Support vector machines
k-nearest neighbors
Neural networks
The machine learning process
Training a supervised machine learning model
Validation of a supervised machine learning model
Testing a supervised machine learning model
Evaluating machine learning models
Risks and limitations of machine learning
Overfitting and underfitting
Bias and variance
Balanced dataset
Models are approximations of reality
Machine learning on unstructured data
Natural language processing (NLP)
Computer vision
Deep learning and artificial intelligence
Artificial intelligence
Deep learning
Supervised Machine Learning
Defining supervised learning
Applications of supervised learning
The two types of supervised learning
Key factors in supervised learning
Steps within supervised learning
Data preparation - laying the foundation
Algorithm selection - choosing the right tool
Model training - learning from data
Model evaluation - assessing performance
Prediction and deployment - putting the model to work
Characteristics of regression and classification algorithms
Regression algorithms
Classification algorithms
Key considerations in supervised learning
Evaluation metrics
Consumer goods
Retail
Manufacturing
Unsupervised Machine Learning
Defining UL
Practical examples of UL
Steps in UL
Step 1 - Data collection.
Step 2 - Data preprocessing
Step 3 - Choosing the right model
Step 4 - Training the model
Step 5 - Interpretation and evaluation
In summary
Clustering - unveiling hidden patterns in your data
What is clustering?
How does clustering work?
k-means clustering
Practical applications of clustering
Evaluation metrics for clustering
Association rule learning
What is association rule learning?
The Apriori algorithm - a practical example
Applications of UL
Market segmentation
Anomaly detection
Feature extraction
Interpreting and Evaluating Machine Learning Models
How do I know whether this model will be accurate?
Evaluating on test (holdout) data
Understanding evaluation metrics
Evaluating regression models
R-squared
Root mean squared error
Mean absolute error
When and how to use each metric
Practical evaluation strategies
Summarizing the evaluation of regression models
Evaluating classification models
Classification model evaluation metrics
Precision, recall, and F1-Score
Recall
F1-score
Methods for explaining machine learning models
Making sense of regression models - the power of coefficients
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage.
Read more...