Search

Home > Linear Digressions > Unfaithful Chain of Thought
Podcast: Linear Digressions
Episode:

Unfaithful Chain of Thought

Category: Technology
Duration: 00:24:32
Publish Date: 2026-04-13 01:00:17
Description: What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same might be true for large language models: when you watch a model reason through a problem in real time, is that chain of thought the genuine process, or just a plausible-sounding story told after the fact? It's a deceptively deep question with real stakes for how much we should trust model explanations. Miles Turpin et al., "Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting" (NeurIPS 2023, NYU and Anthropic): https://arxiv.org/abs/2305.04388 Anthropic, "Reasoning Models Don't Always Say What They Think" (Alignment Faking research, 2025): https://www.anthropic.com/research/reasoning-models-dont-say-think
Total Play: 0

Some more Podcasts by Katie Malone

300+ Episodes