Dimensionality Reduction in Data Science.

Author
Garzon, Max [Browse]
Format
Book
Language
English
Published/​Created
  • Cham : Springer International Publishing AG, 2022.
  • ©2022.
Description
1 online resource (268 pages)

Details

Related name
Source of description
Description based on publisher supplied metadata and other sources.
Contents
  • Intro -- Preface -- Contents -- Acronyms -- 1 What Is Data Science (DS)? -- 1.1 Major Families of Data Science Problems -- 1.1.1 Classification Problems -- 1.1.2 Prediction Problems -- 1.1.3 Clustering Problems -- 1.2 Data, Big Data, and Pre-processing -- 1.2.1 What Is Data? -- 1.2.2 Big Data -- 1.2.3 Data Cleansing -- 1.2.3.1 Duplication -- 1.2.3.2 Fixing/Removing Errors -- 1.2.3.3 Missing Data -- 1.2.3.4 Outliers -- 1.2.3.5 Multicollinearity -- 1.2.4 Data Visualization -- 1.2.5 Data Understanding -- 1.3 Populations and Data Sampling -- 1.3.1 Sampling -- 1.3.2 Training, Testing, and Validation -- 1.4 Overview and Scope -- 1.4.1 Prerequisites and Layout -- 1.4.2 Data Science Methodology -- 1.4.3 Scope of the Book -- Reference -- 2 Solutions to Data Science Problems -- 2.1 Conventional Statistical Solutions -- 2.1.1 Linear Multiple Regression Model: Continuous Response -- 2.1.1.1 Akaike Information Criterion (AIC) -- 2.1.1.2 Bayesian Information Criterion (BIC) -- 2.1.1.3 Adjusted R-Squared -- 2.1.2 Logistic Regression: Categorical Response -- 2.1.3 Variable Selection and Model Building -- 2.1.4 Generalized Linear Model (GLM) -- 2.1.5 Decision Trees -- 2.1.6 Bayesian Learning -- 2.2 Machine Learning Solutions: Supervised -- 2.2.1 k-Nearest Neighbors (kNN) -- 2.2.2 Ensemble Methods -- 2.2.3 Support Vector Machines (SVMs) -- 2.2.4 Neural Networks (NNs) -- 2.3 Machine Learning Solutions: Unsupervised -- 2.3.1 Hard Clustering -- 2.3.2 Soft Clustering -- 2.4 Controls, Evaluation, and Assessment -- 2.4.1 Evaluation Methods -- 2.4.2 Metrics for Assessment -- References -- 3 What Is Dimensionality Reduction (DR)? -- 3.1 Dimensionality Reduction -- 3.2 Major Approaches to Dimensionality Reduction -- 3.2.1 Conventional Statistical Approaches -- 3.2.2 Geometric Approaches -- 3.2.3 Information-Theoretic Approaches -- 3.2.4 Molecular Computing Approaches.
  • 3.3 The Blessings of Dimensionality -- References -- 4 Conventional Statistical Approaches -- 4.1 Principal Component Analysis (PCA) -- 4.1.1 Obtaining the Principal Components -- 4.1.2 Singular Value Decomposition (SVD) -- 4.2 Nonlinear PCA -- 4.2.1 Kernel PCA -- 4.2.2 Independent Component Analysis (ICA) -- 4.3 Nonnegative Matrix Factorization (NMF) -- 4.3.1 Approximate Solutions -- 4.3.2 Clustering and Other Applications -- 4.4 Discriminant Analysis -- 4.4.1 Linear Discriminant Analysis (LDA) -- 4.4.2 Quadratic Discriminant Analysis (QDA) -- 4.5 Sliced Inverse Regression (SIR) -- References -- 5 Geometric Approaches -- 5.1 Introduction to Manifolds -- 5.2 Manifold Learning Methods -- 5.2.1 Multi-Dimensional Scaling (MDS) -- 5.2.1.1 Classical MDS: Spectral Approach -- 5.2.1.2 Metric MDS: Optimization-Based Approach -- 5.2.2 Isometric Mapping (ISOMAP) -- 5.2.3 t-Stochastic Neighbor Embedding ( t-SNE ) -- 5.3 Exploiting Randomness (RND) -- References -- 6 Information-Theoretic Approaches -- 6.1 Shannon Entropy (H) -- 6.2 Reduction by Conditional Entropy -- 6.3 Reduction by Iterated Conditional Entropy -- 6.4 Reduction by Conditional Entropy on Targets -- 6.5 Other Variations -- References -- 7 Molecular Computing Approaches -- 7.1 Encoding Abiotic Data into DNA -- 7.2 Deep Structure of DNA Spaces -- 7.2.1 Structural Properties of DNA Spaces -- 7.2.2 Noncrosshybridizing (nxh) Bases -- 7.3 Reduction by Genomic Signatures -- 7.3.1 Background -- 7.3.2 Genomic Signatures -- 7.4 Reduction by Pmeric Signatures -- References -- 8 Statistical Learning Approaches -- 8.1 Reduction by Multiple Regression -- 8.2 Reduction by Ridge Regression -- 8.3 Reduction by Lasso Regression -- 8.4 Selection Versus Shrinkage -- 8.5 Further Refinements -- References -- 9 Machine Learning Approaches -- 9.1 Autoassociative Feature Encoders -- 9.1.1 Undercomplete Autoencoders.
  • 9.1.2 Sparse Autoencoders -- 9.1.3 Variational Autoencoders -- 9.1.4 Dimensionality Reduction in MNIST Images -- 9.2 Neural Feature Selection -- 9.2.1 Facial Features, Expressions, and Displays -- 9.2.2 The Cohn-Kanade Dataset -- 9.2.3 Primary and Derived Features -- 9.3 Other Methods -- References -- 10 Metaheuristics of DR Methods -- 10.1 Exploiting Feature Grouping -- 10.2 Exploiting Domain Knowledge -- 10.2.1 What Is Domain Knowledge? -- 10.2.2 Domain Knowledge for Dimensionality Reduction -- 10.3 Heuristic Rules for Feature Selection, Extraction, and Number -- 10.4 About Explainability of Solutions -- 10.4.1 What Is Explainability? -- 10.4.1.1 Outcome Explanations -- 10.4.1.2 Model Explanations -- 10.4.2 Explainability in Dimensionality Reduction -- 10.5 Choosing Wisely -- 10.6 About the Curse of Dimensionality -- 10.7 About the No-Free-Lunch Theorem (NFL) -- References -- 11 Appendices -- 11.1 Statistics and Probability Background -- 11.1.1 Commonly Used Discrete Distributions -- 11.1.2 Commonly Used Continuous Distributions -- 11.1.3 Major Results in Probability and Statistics -- 11.2 Linear Algebra Background -- 11.2.1 Fields, Vector Spaces and Subspaces -- 11.2.2 Linear Independence, Bases and Dimension -- 11.2.3 Linear Transformations and Matrices -- 11.2.4 Eigenvalues and Spectral Decomposition -- 11.3 Computer Science Background -- 11.3.1 Computational Science and Complexity -- 11.3.2 Machine Learning -- 11.4 Typical Data Science Problems -- 11.5 A Sample of Common and Big Datasets -- 11.6 Computing Platforms -- 11.6.1 The Environment R -- 11.6.2 Python Environments -- References.
ISBN
9783031053719 ((electronic bk.))
Statement on language in description
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...
Other views
Staff view

Supplementary Information