Skip to search
Skip to main content
Search in
Keyword
Title (keyword)
Author (keyword)
Subject (keyword)
Title starts with
Subject (browse)
Author (browse)
Author (sorted by title)
Call number (browse)
search for
Search
Advanced Search
Bookmarks
(
0
)
Princeton University Library Catalog
Start over
Cite
Send
to
SMS
Email
EndNote
RefWorks
RIS format (e.g. Zotero)
Printer
Bookmark
Dimensionality reduction in data science / Max Garzon [and five others].
Format
Book
Language
English
Published/Created
Cham, Switzerland : Springer, [2022]
©2022
Description
1 online resource (268 pages) : illustrations
Details
Subject(s)
Big data
[Browse]
Data mining
—
Computer programs
[Browse]
Author
Garzón, Max
[Browse]
Summary note
This book provides a practical and fairly comprehensive review of Data Science through the lens of dimensionality reduction, as well as hands-on techniques to tackle problems with data collected in the real world. State-of-the-art results and solutions from statistics, computer science and mathematics are explained from the point of view of a practitioner in any domain science, such as biology, cyber security, chemistry, sports science and many others. Quantitative and qualitative assessment methods are described to implement and validate the solutions back in the real world where the problems originated. The ability to generate, gather and store volumes of data in the order of tera- and exo bytes daily has far outpaced our ability to derive useful information with available computational resources for many domains. This book focuses on data science and problem definition, data cleansing, feature selection and extraction,statistical, geometric, information-theoretic, biomolecular and machine learning methods for dimensionality reduction of big datasets and problem solving, as well as a comparative assessment of solutions in a real-world setting. This book targets professionals working within related fields with an undergraduate degree in any science area, particularly quantitative. Readers should be able to follow examples in this book that introduce each method or technique. These motivating examples are followed by precise definitions of the technical concepts required and presentation of the results in general situations. These concepts require a degree of abstraction that can be followed by re-interpreting concepts like in the original example(s). Finally, each section closes with solutions to the original problem(s) afforded by these techniques, perhaps in various ways to compare and contrast dis/advantages to other solutions.
Bibliographic references
Includes bibliographical references.
Source of description
Description based on print version record.
Contents
Intro
Preface
Contents
Acronyms
1 What Is Data Science (DS)?
1.1 Major Families of Data Science Problems
1.1.1 Classification Problems
1.1.2 Prediction Problems
1.1.3 Clustering Problems
1.2 Data, Big Data, and Pre-processing
1.2.1 What Is Data?
1.2.2 Big Data
1.2.3 Data Cleansing
1.2.3.1 Duplication
1.2.3.2 Fixing/Removing Errors
1.2.3.3 Missing Data
1.2.3.4 Outliers
1.2.3.5 Multicollinearity
1.2.4 Data Visualization
1.2.5 Data Understanding
1.3 Populations and Data Sampling
1.3.1 Sampling
1.3.2 Training, Testing, and Validation
1.4 Overview and Scope
1.4.1 Prerequisites and Layout
1.4.2 Data Science Methodology
1.4.3 Scope of the Book
Reference
2 Solutions to Data Science Problems
2.1 Conventional Statistical Solutions
2.1.1 Linear Multiple Regression Model: Continuous Response
2.1.1.1 Akaike Information Criterion (AIC)
2.1.1.2 Bayesian Information Criterion (BIC)
2.1.1.3 Adjusted R-Squared
2.1.2 Logistic Regression: Categorical Response
2.1.3 Variable Selection and Model Building
2.1.4 Generalized Linear Model (GLM)
2.1.5 Decision Trees
2.1.6 Bayesian Learning
2.2 Machine Learning Solutions: Supervised
2.2.1 k-Nearest Neighbors (kNN)
2.2.2 Ensemble Methods
2.2.3 Support Vector Machines (SVMs)
2.2.4 Neural Networks (NNs)
2.3 Machine Learning Solutions: Unsupervised
2.3.1 Hard Clustering
2.3.2 Soft Clustering
2.4 Controls, Evaluation, and Assessment
2.4.1 Evaluation Methods
2.4.2 Metrics for Assessment
References
3 What Is Dimensionality Reduction (DR)?
3.1 Dimensionality Reduction
3.2 Major Approaches to Dimensionality Reduction
3.2.1 Conventional Statistical Approaches
3.2.2 Geometric Approaches
3.2.3 Information-Theoretic Approaches
3.2.4 Molecular Computing Approaches.
3.3 The Blessings of Dimensionality
4 Conventional Statistical Approaches
4.1 Principal Component Analysis (PCA)
4.1.1 Obtaining the Principal Components
4.1.2 Singular Value Decomposition (SVD)
4.2 Nonlinear PCA
4.2.1 Kernel PCA
4.2.2 Independent Component Analysis (ICA)
4.3 Nonnegative Matrix Factorization (NMF)
4.3.1 Approximate Solutions
4.3.2 Clustering and Other Applications
4.4 Discriminant Analysis
4.4.1 Linear Discriminant Analysis (LDA)
4.4.2 Quadratic Discriminant Analysis (QDA)
4.5 Sliced Inverse Regression (SIR)
5 Geometric Approaches
5.1 Introduction to Manifolds
5.2 Manifold Learning Methods
5.2.1 Multi-Dimensional Scaling (MDS)
5.2.1.1 Classical MDS: Spectral Approach
5.2.1.2 Metric MDS: Optimization-Based Approach
5.2.2 Isometric Mapping (ISOMAP)
5.2.3 t-Stochastic Neighbor Embedding ( t-SNE )
5.3 Exploiting Randomness (RND)
6 Information-Theoretic Approaches
6.1 Shannon Entropy (H)
6.2 Reduction by Conditional Entropy
6.3 Reduction by Iterated Conditional Entropy
6.4 Reduction by Conditional Entropy on Targets
6.5 Other Variations
7 Molecular Computing Approaches
7.1 Encoding Abiotic Data into DNA
7.2 Deep Structure of DNA Spaces
7.2.1 Structural Properties of DNA Spaces
7.2.2 Noncrosshybridizing (nxh) Bases
7.3 Reduction by Genomic Signatures
7.3.1 Background
7.3.2 Genomic Signatures
7.4 Reduction by Pmeric Signatures
8 Statistical Learning Approaches
8.1 Reduction by Multiple Regression
8.2 Reduction by Ridge Regression
8.3 Reduction by Lasso Regression
8.4 Selection Versus Shrinkage
8.5 Further Refinements
9 Machine Learning Approaches
9.1 Autoassociative Feature Encoders
9.1.1 Undercomplete Autoencoders.
9.1.2 Sparse Autoencoders
9.1.3 Variational Autoencoders
9.1.4 Dimensionality Reduction in MNIST Images
9.2 Neural Feature Selection
9.2.1 Facial Features, Expressions, and Displays
9.2.2 The Cohn-Kanade Dataset
9.2.3 Primary and Derived Features
9.3 Other Methods
10 Metaheuristics of DR Methods
10.1 Exploiting Feature Grouping
10.2 Exploiting Domain Knowledge
10.2.1 What Is Domain Knowledge?
10.2.2 Domain Knowledge for Dimensionality Reduction
10.3 Heuristic Rules for Feature Selection, Extraction, and Number
10.4 About Explainability of Solutions
10.4.1 What Is Explainability?
10.4.1.1 Outcome Explanations
10.4.1.2 Model Explanations
10.4.2 Explainability in Dimensionality Reduction
10.5 Choosing Wisely
10.6 About the Curse of Dimensionality
10.7 About the No-Free-Lunch Theorem (NFL)
11 Appendices
11.1 Statistics and Probability Background
11.1.1 Commonly Used Discrete Distributions
11.1.2 Commonly Used Continuous Distributions
11.1.3 Major Results in Probability and Statistics
11.2 Linear Algebra Background
11.2.1 Fields, Vector Spaces and Subspaces
11.2.2 Linear Independence, Bases and Dimension
11.2.3 Linear Transformations and Matrices
11.2.4 Eigenvalues and Spectral Decomposition
11.3 Computer Science Background
11.3.1 Computational Science and Complexity
11.3.2 Machine Learning
11.4 Typical Data Science Problems
11.5 A Sample of Common and Big Datasets
11.6 Computing Platforms
11.6.1 The Environment R
11.6.2 Python Environments
References.
Show 144 more Contents items
ISBN
3-031-05371-0
Statement on responsible collection description
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage.
Read more...
Other views
Staff view
Need Help?
Ask a Question
Suggest a Correction
Report a Missing Item
Supplementary Information