LEADER 01609nam a2200349 i 4500001 99130866165406421 005 20240429183011.0 006 m o d | 007 cr cnu|||||||| 008 240429s2024 enka o 000 0 eng d 020 1-80512-191-X 035 (MiAaPQ)EBC31267492 035 (Au-PeEL)EBL31267492 035 (CKB)31405603300041 035 (OCoLC)1429720163 035 (OCoLC-P)1429720163 035 (CaSebORM)9781805120100 035 (EXLCZ)9931405603300041 040 MiAaPQ |beng |erda |epn |cMiAaPQ |dMiAaPQ 050 4 Q325.5 |b.A484 2024 082 0 006.31 |223 100 1 Alves, Maicon Melo, |eauthor. 245 10 Accelerate Model Training with Pytorch 2. X : |bBuild More Accurate Models by Boosting the Model Training Process / |cMaicon Melo Alves and Lúcia Maria de Assumpção Drummond. 250 First edition. 264 1 Birmingham, England : |bPackt Publishing Ltd., |c[2024] 264 4 |c©2024 300 1 online resource (230 pages) 336 text |btxt |2rdacontent 337 computer |bc |2rdamedia 338 online resource |bcr |2rdacarrier 505 0 Cover -- Title page -- Copyright and credits -- Foreword -- Contributors -- Table of Contents -- Preface -- Part 1: Paving the Way -- Chapter 1: Deconstructing the Training Process -- Technical requirements -- Remembering the training process -- Dataset -- The training algorithm -- Understanding the computational burden of the model training phase -- Hyperparameters -- Operations -- Parameters -- Quiz time! -- Summary -- Chapter 2: Training Models Faster -- Technical requirements -- What options do we have? -- Modifying the software stack -- Increasing computing resources -- Modifying the application layer -- What can we change in the application layer? -- Getting hands-on -- What if we change the batch size? -- Modifying the environment layer -- What can we change in the environment layer? -- Getting hands-on -- Quiz time! -- Summary -- Part 2: Going Faster -- Chapter 3: Compiling the Model -- Technical requirements -- What do you mean by compiling? -- Execution modes -- Model compiling -- Using the Compile API -- Basic usage -- Give me a real fight - training a heavier model! -- How does the Compile API work under the hood? -- Compiling workflow and components -- Backends -- Quiz time! -- Summary -- Chapter 4: Using Specialized Libraries -- Technical requirements -- Multithreading with OpenMP -- What is multithreading? -- Using and configuring OpenMP -- Using and configuring Intel OpenMP -- Optimizing Intel CPU with IPEX -- Using IPEX -- How does IPEX work under the hood? -- Quiz time! -- Summary -- Chapter 5: Building an Efficient Data Pipeline -- Technical requirements -- Why do we need an efficient data pipeline? -- What is a data pipeline? -- How to build a data pipeline -- Data pipeline bottleneck -- Accelerating data loading -- Optimizing a data transfer to the GPU -- Configuring data pipeline workers -- Reaping the rewards. 505 8 Quiz time! -- Summary -- Chapter 6: Simplifying the Model -- Technical requirements -- Knowing the model simplifying process -- Why simplify a model? (reason) -- How to simplify a model? (process) -- When do we simplify a model? (moment) -- Using Microsoft NNI to simplify a model -- Overview of NNI -- NNI in action! -- Quiz time! -- Summary -- Chapter 7: Adopting Mixed Precision -- Technical requirements -- Remembering numeric precision -- How do computers represent numbers? -- Floating-point representation -- Novel data types -- A summary, please! -- Understanding the mixed precision strategy -- What is mixed precision? -- Why use mixed precision? -- How to use mixed precision -- How about Tensor Cores? -- Enabling AMP -- Activating AMP on GPU -- AMP, show us what you are capable of! -- Quiz time! -- Summary -- Part 3: Going Distributed -- Chapter 8: Distributed Training at a Glance -- Technical requirements -- A first look at distributed training -- When do we need to distribute the training process? -- Where do we execute distributed training? -- Learning the fundamentals of parallelism strategies -- Model parallelism -- Data parallelism -- Distributed training on PyTorch -- Basic workflow -- Communication backend and program launcher -- Quiz time! -- Summary -- Chapter 9: Training with Multiple CPUs -- Technical requirements -- Why distribute the training on multiple CPUs? -- Why not increase the number of threads? -- Distributed training on rescue -- Implementing distributed training on multiple CPUs -- The Gloo communication backend -- Coding distributed training to run on multiple CPUs -- Launching distributed training on multiple CPUs -- Getting faster with Intel oneCCL -- What is Intel oneCCL? -- Code implementation and launching -- Is oneCCL really better? -- Quiz time! -- Summary -- Chapter 10: Training with Multiple GPUs. 505 8 Technical requirements -- Demystifying the multi-GPU environment -- The popularity of multi-GPU environments -- Understanding multi-GPU interconnection -- How does interconnection topology affect performance? -- Discovering the interconnection topology -- Setting GPU affinity -- Implementing distributed training on multiple GPUs -- The NCCL communication backend -- Coding and launching distributed training with multiple GPUs -- Experimental evaluation -- Quiz time! -- Summary -- Chapter 11: Training with Multiple Machines -- Technical requirements -- What is a computing cluster? -- Workload manager -- Understanding the high-performance network -- Implementing distributed training on multiple machines -- Getting introduced to Open MPI -- Why use Open MPI and NCCL? -- Coding and launching the distributed training on multiple machines -- Experimental evaluation -- Quiz time! -- Summary -- Index -- Other Books You May Enjoy. 588 Description based on publisher supplied metadata and other sources. 588 Description based on print version record. 520 Dramatically accelerate the building process of complex models using PyTorch to extract the best performance from any computing environment Key Features Reduce the model-building time by applying optimization techniques and approaches Harness the computing power of multiple devices and machines to boost the training process Focus on model quality by quickly evaluating different model configurations Purchase of the print or Kindle book includes a free PDF eBook Book Description Penned by an expert in High-Performance Computing (HPC) with over 25 years of experience, this book is your guide to enhancing the performance of model training using PyTorch, one of the most widely adopted machine learning frameworks. You'll start by understanding how model complexity impacts training time before discovering distinct levels of performance tuning to expedite the training process. You'll also learn how to use a new PyTorch feature to compile the model and train it faster, alongside learning how to benefit from specialized libraries to optimize the training process on the CPU. As you progress, you'll gain insights into building an efficient data pipeline to keep accelerators occupied during the entire training execution and explore strategies for reducing model complexity and adopting mixed precision to minimize computing time and memory consumption. The book will get you acquainted with distributed training and show you how to use PyTorch to harness the computing power of multicore systems and multi-GPU environments available on single or multiple machines. By the end of this book, you'll be equipped with a suite of techniques, approaches, and strategies to speed up training , so you can focus on what really matters--building stunning models! What you will learn Compile the model to train it faster Use specialized libraries to optimize the training on the CPU Build a data pipeline to boost GPU execution Simplify the model through pruning and compression techniques Adopt automatic mixed precision without penalizing the model's accuracy Distribute the training step across multiple machines and devices Who this book is for This book is for intermediate-level data scientists who want to learn how to leverage PyTorch to speed up the training process of their machine learning models by employing a set of optimization strategies and techniques. To make the most of this book, familiarity with basic concepts of machine learning, PyTorch, and Python is essential. However, there is no obligation to have a prior understanding of distributed computing, accelerators, or multicore processors. 650 0 Neural networks (Computer science) 650 0 Machine learning. 650 0 Python (Computer program language) 776 |z1-80512-010-7 700 1 Drummond, Lúcia Maria de Assumpção, |eauthor. 906 BOOK