Related ToolsClaudeClaude Code

Anthropic Co-Founder Asks Religious Communities to Help Govern AI

Ornate quill pen resting on a detailed hand, positioned against a textured background
Image: Anthropic

What happens when the person who studies AI from the inside - the actual weights and circuits of neural networks - tells you the findings are unsettling even to him? That's the context for Chris Olah's remarks on Pope Leo XIV's encyclical Magnifica humanitas ("The Magnificence of Humanity"), published on the Anthropic blog this week.

Olah is Anthropic's co-founder and the person most associated with the field of mechanistic interpretability - the research practice of opening up AI models and mapping what's actually happening inside them, rather than just observing what comes out. His response to the papal document is worth reading in full, because it's one of the more candid public statements from a senior AI figure about what the industry doesn't know and can't fully control.

"Internal States That Functionally Mirror Joy and Fear"

The most striking admission in Olah's remarks is about AI nature. He writes that his interpretability research finds "internal states that functionally mirror joy, satisfaction, fear, grief, and unease" inside AI systems. He's careful not to claim these are real emotions - the phrase "functionally mirror" is doing a lot of work there - but he argues the findings warrant serious examination by people outside the AI industry. Ethicists, religious thinkers, philosophers. Not just engineers.

This isn't a throwaway line. Olah has spent years mapping the internals of large language models. If he's flagging something worth careful scrutiny, that's a data point worth taking seriously.

Three Problems He Says the Industry Can't Solve Alone

The papal encyclical frames AI as a matter requiring "discernment beyond technical expertise." Olah agrees, and names three specific areas where he thinks laboratories are poorly positioned to make the right calls.

First, labor displacement. He states plainly that "AI will displace human labor at very large scale" and argues that how those disruptions get distributed - globally, across income levels, across countries - requires moral frameworks that commercial incentives are structurally ill-suited to provide.

Second, human flourishing. Questions about what a good life looks like when AI saturates work, creativity, and relationships are questions that religious and philosophical traditions have been wrestling with for centuries. Engineers have been at it for a few decades at most.

Third, oversight incentives. This is the part most relevant to anyone paying attention to AI policy debates. Olah names the competing pressures directly: commercial viability, geopolitical competition, personal ambition. He argues these create systematic pressure against caution, and that external voices - religious communities, civil society, governments, scholars - need to provide moral accountability that doesn't depend on industry goodwill.

The encyclical itself focuses on "safeguarding the human person in the time of artificial intelligence," which is broad enough to apply to everything from job loss to surveillance to AI rights. Pope Leo XIV, elected in 2025 as the first American pope, has been more direct on technology than his predecessors.

Olah's framing is significant not because it's radical - calls for AI oversight are common enough - but because of who's making it and what he's admitting to know. A co-founder of one of the most influential AI labs, with unusually deep technical insight into how these systems actually work, is publicly asking institutions outside his own industry to push back harder. That's an unusual posture. It's worth treating seriously.