Introduction to AI Safety, Ethics, and Society.

Author
Hendrycks, Dan [Browse]
Format
Book
Language
English
Εdition
1st ed.
Published/​Created
  • Boca Raton : Taylor & Francis Group, 2024.
  • ©2025.
Description
1 online resource (562 pages)

Details

Summary note
Thos book is a comprehensive and interdisciplinary introduction to AI Safety. As AI threatens to transform society, it becomes increasingly more important to understand the risks that AI poses, and to learn what measures we can take to mitigate them.
Source of description
Description based on publisher supplied metadata and other sources.
Contents
  • Cover
  • Endorsements
  • Half Title
  • Title Page
  • Copyright Page
  • Contents
  • Introduction
  • SECTION I: AI and Societal-Scale Risks
  • CHAPTER 1: Overview of Catastrophic AI Risks
  • 1.1. INTRODUCTION
  • 1.2. MALICIOUS USE
  • 1.2.1. Bioterrorism
  • 1.2.2. Unleashing AI Agents
  • 1.2.3. Persuasive AIs
  • 1.2.4. Concentration of Power
  • 1.3. AI RACE
  • 1.3.1. Military AI Arms Race
  • 1.3.2. Corporate AI Race
  • 1.3.3. Evolutionary Pressures
  • 1.4. ORGANIZATIONAL RISKS
  • 1.4.1. Accidents Are Hard to Avoid
  • 1.4.2. Organizational Factors can Reduce the Chances of Catastrophe
  • 1.5. ROGUE AIS
  • 1.5.1. Proxy Gaming
  • 1.5.2. Goal Drift
  • 1.5.3. Power-Seeking
  • 1.5.4. Deception
  • 1.6. DISCUSSION OF CONNECTIONS BETWEEN RISKS
  • 1.7. CONCLUSION
  • 1.8. LITERATURE
  • 1.8.1. Recommended Reading
  • CHAPTER 2: Artificial Intelligence Fundamentals
  • 2.1. INTRODUCTION
  • 2.2. ARTIFICIAL INTELLIGENCE &
  • MACHINE LEARNING
  • 2.2.1. Artificial Intelligence
  • 2.2.2. Types of AI
  • 2.2.3. Machine Learning
  • 2.2.4. Types of Machine Learning
  • 2.3. DEEP LEARNING
  • 2.3.1. Model Building Blocks
  • 2.3.2. Training and Inference
  • 2.3.3. History and Timeline of Key Architectures
  • 2.3.4. Applications
  • 2.4. SCALING LAWS
  • 2.4.1. Scaling Laws in DL
  • 2.5. SPEED OF AI DEVELOPMENT
  • 2.6. CONCLUSION
  • 2.6.1. Summary
  • 2.7. LITERATURE
  • 2.7.1. Recommended Resources
  • SECTION II: Safety
  • CHAPTER 3: Single-Agent Safety
  • 3.1. INTRODUCTION
  • 3.2. MONITORING
  • 3.2.1. ML Systems Are Opaque
  • 3.2.2. Motivations for Transparency Research
  • 3.2.3. Approaches to Transparency
  • 3.2.4. Emergent Capabilities
  • 3.2.5. Emergent Goal-Directed Behavior
  • 3.2.6. Tail Risk: Emergent Goals
  • 3.2.7. Evaluations and Anomaly Detection
  • 3.3. ROBUSTNESS
  • 3.3.1. Proxies in ML
  • 3.3.2. Proxy Gaming
  • 3.3.3. Adversarial Examples.
  • 3.3.4. Trojan Attacks and Other Security Threats
  • 3.3.5. Tail Risk: AI Evaluator Gaming
  • 3.4. ALIGNMENT
  • 3.4.1. Deception
  • 3.4.2. Deceptive Evaluation Gaming
  • 3.4.3. Tail Risk: Deceptive Alignment and Treacherous Turns
  • 3.4.4. Power
  • 3.4.5. People Could Enlist AIs for Power Seeking
  • 3.4.6. Power Seeking Can Be Instrumentally Rational
  • 3.4.7. Structural Pressures Toward Power-Seeking AI
  • 3.4.8. Tail Risk: Power-Seeking Behavior
  • 3.4.9. Techniques to Control AI Systems
  • 3.5. SYSTEMIC SAFETY
  • 3.6. SAFETY AND GENERAL CAPABILITIES
  • 3.7. CONCLUSION
  • 3.8. LITERATURE
  • 3.8.1. Recommended Reading
  • CHAPTER 4: Safety Engineering
  • 4.1. RISK DECOMPOSITION
  • 4.1.1. Failure Modes, Hazards, and Threats
  • 4.1.2. The Classic Risk Equation
  • 4.1.3. Framing the Goal as Risk Reduction
  • 4.1.4. Disaster Risk Equation
  • 4.1.5. Elements of the Risk Equation
  • 4.1.6. Applying the Disaster Risk Equation
  • 4.2. NINES OF RELIABILITY
  • 4.3. SAFE DESIGN PRINCIPLES
  • 4.3.1. Redundancy
  • 4.3.2. Separation of Duties
  • 4.3.3. Principle of Least Privilege
  • 4.3.4. Fail-Safes
  • 4.3.5. Antifragility
  • 4.3.6. Negative Feedback Mechanisms
  • 4.3.7. Transparency
  • 4.3.8. Defense in Depth
  • 4.3.9. Review of Safe Design Principles
  • 4.4. COMPONENT FAILURE ACCIDENT MODELS AND METHODS
  • 4.4.1. Swiss Cheese Model
  • 4.4.2. Bow Tie Model
  • 4.4.3. Fault Tree Analysis Method
  • 4.4.4. Limitations
  • 4.5. SYSTEMIC FACTORS
  • 4.5.1. Systemic Accident Models
  • 4.6. DRIFT INTO FAILURE AND EXISTENTIAL RISKS
  • 4.7. TAIL EVENTS AND BLACK SWANS
  • 4.7.1. Introduction to Tail Events
  • 4.7.2. Tail Events Can Greatly Affect the Average Risk
  • 4.7.3. Tail Events Can Be Identified From Frequency Distributions
  • 4.7.4. A Caricature of Tail Events
  • 4.7.5. Introduction to Black Swans
  • 4.7.6. Known Unknowns and Unknown Unknowns.
  • 4.7.7. Implications of Tail Events and Black Swans for Risk Analysis
  • 4.7.8. Identifying the Risk of Tail Events or Black Swans
  • 4.8. CONCLUSION
  • 4.8.1. Summary
  • 4.8.2. Key Takeaways
  • 4.9. LITERATURE
  • 4.9.1. Recommended Reading
  • CHAPTER 5: Complex Systems
  • 5.1. OVERVIEW
  • 5.2. INTRODUCTION TO COMPLEX SYSTEMS
  • 5.2.1. The Reductionist Paradigm
  • 5.2.2. The Complex Systems Paradigm
  • 5.2.3. DL Systems as Complex Systems
  • 5.2.4. Complexity Is Not a Dichotomy
  • 5.2.5. The Hallmarks of Complex Systems
  • 5.2.6. Social Systems as Complex Systems
  • 5.3. COMPLEX SYSTEMS FOR AI SAFETY
  • 5.3.1. General Lessons from Complex Systems
  • 5.3.2. Puzzles, Problems, and Wicked Problems
  • 5.3.3. Challenges With Interventionism
  • 5.3.4. Systemic Issues
  • 5.4. CONCLUSION
  • 5.5. LITERATURE
  • 5.5.1. Recommended Reading
  • SECTION III: Ethics and Society
  • CHAPTER 6: Beneficial AI and Machine Ethics
  • 6.1. INTRODUCTION
  • 6.2. LAW
  • 6.2.1. The Case for Law
  • 6.2.2. The Need for Ethics
  • 6.3. FAIRNESS
  • 6.3.1. Bias
  • 6.3.2. Sources of Bias
  • 6.3.3. AI Fairness Concepts
  • 6.3.4. Limitations of Fairness
  • 6.3.5. Approaches to Combating Bias and Improving Fairness
  • 6.4. THE ECONOMIC ENGINE
  • 6.4.1. Allocative Efficiency of Free Markets
  • 6.4.2. Market Failures
  • 6.4.3. Inequality
  • 6.4.4. Growth
  • 6.4.5. Beyond Economic Models
  • 6.5. WELLBEING
  • 6.5.1. Wellbeing as the Net Balance of Pleasure over Pain
  • 6.5.2. Wellbeing as a Collection of Objective Goods
  • 6.5.3. Wellbeing as Preference Satisfaction
  • 6.5.4. Applying the Theories of Wellbeing
  • 6.6. PREFERENCES
  • 6.6.1. Revealed Preferences
  • 6.6.2. Stated Preferences
  • 6.6.3. Idealized Preferences
  • 6.7. HAPPINESS
  • 6.7.1. The General Approach to Happiness
  • 6.7.2. Problems for Happiness-Focused Ethics
  • 6.8. SOCIAL WELFARE FUNCTIONS.
  • 6.8.1. Measuring Social Welfare
  • 6.9. MORAL UNCERTAINTY
  • 6.9.1. Making Decisions Under Moral Uncertainty
  • 6.9.2. Implementing a Moral Parliament in AI Systems
  • 6.9.3. Advantages of a Moral Parliament
  • 6.10. CONCLUSION
  • 6.11. LITERATURE
  • 6.11.1. Recommended Reading
  • CHAPTER 7: Collective Action Problems
  • 7.1. MOTIVATION
  • 7.2. GAME THEORY
  • 7.2.1. Overview
  • 7.2.2. Game Theory Fundamentals
  • 7.2.3. The Prisoner's Dilemma
  • 7.2.4. The Iterated Prisoner's Dilemma
  • 7.2.5. Collective Action Problems
  • 7.2.6. Summary
  • 7.3. COOPERATION
  • 7.3.1. Summary
  • 7.4. CONFLICT
  • 7.4.1. Overview
  • 7.4.2. Bargaining Theory
  • 7.4.3. Commitment Problems
  • 7.4.4. Information Problems
  • 7.4.5. Factors Outside of Bargaining Theory
  • 7.4.6. Summary
  • 7.5. EVOLUTIONARY PRESSURES
  • 7.5.1. Overview
  • 7.5.2. Generalized Darwinism
  • 7.5.3. Levels of Selection and Selfish Behavior
  • 7.5.4. Summary
  • 7.6. CONCLUSION
  • 7.7. LITERATURE
  • 7.7.1. Recommended Reading
  • CHAPTER 8: Governance
  • 8.1. INTRODUCTION
  • 8.1.1. The Landscape
  • 8.2. ECONOMIC GROWTH
  • 8.3. DISTRIBUTION OF AI
  • 8.3.1. Distribution of Access to AI
  • 8.3.2. Distribution of Power Among AIs
  • 8.4. CORPORATE GOVERNANCE
  • 8.4.1. What Is Corporate Governance?
  • 8.4.2. Legal Structure
  • 8.4.3. Ownership Structure
  • 8.4.4. Organizational Structure
  • 8.4.5. Assurance
  • 8.5. NATIONAL GOVERNANCE
  • 8.5.1. Standards and Regulations
  • 8.5.2. Liability for AI Harms
  • 8.5.3. Targeted Taxation
  • 8.5.4. Public Ownership over AI
  • 8.5.5. Improving Resilience
  • 8.5.6. Not Falling Behind
  • 8.5.7. Information Security
  • 8.6. INTERNATIONAL GOVERNANCE
  • 8.6.1. Forms of International Governance
  • 8.6.2. Four Questions for AI Regulation
  • 8.6.3. What Can Be Included in International Agreements?
  • 8.7. COMPUTE GOVERNANCE.
  • 8.7.1. Compute Is Indispensable for AI Development and Deployment
  • 8.7.2. Compute Is Physical, Excludable, and Quantifiable
  • 8.8. CONCLUSION
  • 8.9. LITERATURE
  • 8.9.1. Recommended Reading
  • Acknowledgments
  • References
  • Index.
ISBN
  • 9781003530336
  • 1003530338
  • 9781040261163
  • 1040261167
Statement on responsible collection description
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...
Other views
Staff view

Supplementary Information