Skip to content
Future of Life Institute Podcast
← All episodes
Technology & Future

AIAP: An Overview of Technical AI Alignment with Rohin Shah (Part 2)

The space of AI alignment research is highly dynamic, and it's often difficult to get a bird's eye view of the landscape. This podcast is the second of two parts attempting to partially remedy this by providing an overview of technical AI alignment efforts. In particular, this episode seeks to continue the discussion from Part 1 by going in more depth with regards to the specific approaches to AI alignment. In this podcast, Lucas spoke with Rohin Shah. Rohin


Listen to Episode Here


Show Notes

The space of AI alignment research is highly dynamic, and it's often difficult to get a bird's eye view of the landscape. This podcast is the second of two parts attempting to partially remedy this by providing an overview of technical AI alignment efforts. In particular, this episode seeks to continue the discussion from Part 1 by going in more depth with regards to the specific approaches to AI alignment. In this podcast, Lucas spoke with Rohin Shah. Rohin is a 5th year PhD student at UC Berkeley with the Center for Human-Compatible AI, working with Anca Dragan, Pieter Abbeel and Stuart Russell. Every week, he collects and summarizes recent progress relevant to AI alignment in the Alignment Newsletter.

Topics discussed in this episode include:

  • Embedded agency
  • The field of "getting AI systems to do what we want"
  • Ambitious value learning
  • Corrigibility, including iterated amplification, debate, and factored cognition
  • AI boxing and impact measures
  • Robustness through verification, adverserial ML, and adverserial examples
  • Interpretability research
  • Comprehensive AI Services
  • Rohin's relative optimism about the state of AI alignment

You can take a short (3 minute) survey to share your feedback about the podcast here.

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, iTunes, Google Play, Stitcher, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

Recommended/mentioned reading

Value Learning sequence

Embedded Agency sequence

Iterated Amplification sequence

AI Alignment Newsletter database

Reframing Superintelligence: CAIS as General Intelligence

Guidelines for AI Containment

Penalizing side effects using stepwise relative reachability

Towards a New Impact Measure

Techniques for optimizing worst-case performance

Cooperative Inverse Reinforcement Learning

Deep reinforcement learning from human preferences

Inverse Reward Design

Clarifying “AI alignment”

Supervising strong learners by amplifying weak experts

AI safety via debate

Factored Cognition

The Building Blocks of Interpretability

Feature Visualization

Good and safe uses of AI Oracles

You can learn more about Rohin’s work here and follow his Alignment Newsletter here.