EPISODE 2026-06-02

AI:AM LIVE — June 2, 2026 — Self-Improving Tax Agents and Catholic AI: Arthur Fernandes Araujo, John de Wasseige, Matthew Harvey Sanders

Day two of AI in the AM, live from the morning of June 2, 2026. OpenAI forward-deployed engineers Arthur Fernandes Araujo and John de Wasseige walked through the self-improving tax agent they built with Codex for thirty-plus accounting firms — what the self-improvement loop actually is, what changed for the accountants, and where the profession is headed. Nathan brought field notes from the Recursive RSI conference in San Francisco, speed-running four papers on personas, metagaming, accidental chain-of-thought training, and natural language autoencoders. Matthew Harvey Sanders, CEO of Longbeard and builder of Magisterium AI, joined from Rome a week after attending the Vatican release of Pope Leo XIV's first encyclical on AI, and made the case for why sovereign Catholic AI is not optional.

▶ Full show on YouTube 𝕏 Live broadcast

AI in the AM is a live weekday morning show on AI. Day two paired two of OpenAI's forward deployed engineers with the builder of the world's first Catholic AI — the automation of expert work on one side, what AI means for the Church on the other — with field notes from the Recursive RSI event in between.

The rundown

2:55Opening37 min
Opening — news and discussionDay-two reflections on the show's recursive self-improvement philosophy, plus the morning's news: Anthropic's confidential S-1 filing, Google's unprecedented $80B equity raise with Berkshire Hathaway, Bernie Sanders' proposal for a national equity stake in AI companies, OpenAI models landing on AWS Bedrock, and signs that AI2 may be breaking up. Nathan and Prakash debate whether taxes or equity stakes better serve the public interest as AI wealth concentrates.
Watch
As aired
Nathan and Prakash open their second show day reflecting on the recursive self-improvement philosophy of AI:AM — using AI tooling to preprocess information, accelerate their own learning, and produce durable artifacts that serve even plugged-in insiders who can't tune in live. Nathan frames the project as building a harness for themselves: human 'proto-AGIs' whose natural affordances are slow, and whose goal is at least an order-of-magnitude speed-up through AI-preprocessed views of current events.
Prakash walks through the overnight news: Google's unprecedented $80 billion equity raise (first since IPO, with Berkshire Hathaway putting in $10 billion at a slight discount), Anthropic's confidential S-1 registration with the SEC announced publicly on its blog, Sam Altman sparring with Anthropic on CNBC, and signs that AI2 (Allen Institute) may be breaking up as key researchers including Nathan Lambert depart. OpenAI also went live on AWS, deepening the intertwined capital relationships across the AI ecosystem.
The hosts debate Bernie Sanders' proposal for a national equity stake in major AI companies. Nathan is sympathetic — arguing that as AI mega-corps approach a substantial fraction of US national wealth while potentially paying little in corporate tax (via endless CapEx reinvestment), the public deserves some upside. Prakash pushes back, contending that taxes already represent a superior, non-dischargeable government claim compared to minority equity stakes, and that Bernie's proposal adds on top of rather than replacing existing government take — following the same 'additive' failure mode as many UBI proposals. The debate closes on an open question about what a fair tax base even looks like if AI displaces the payroll that drives income-tax revenue.
Key moments
We're building a harness for ourselves where we are, in some sense, little proto-AGIs — but our affordances are, by default, kind of slow. Creating these skills to preprocess information and present them in a hopefully effective and time-efficient way — we need to be aiming for at least an order of magnitude speed-up relative to what we could do on our own.
Nathan Labenz13:46
A livestream is producing the content itself in bulk, but the refinement and clipping process is what converts it — and I look at this as primary historical research, because we are talking to people who are actually in the flow and are making history in the last three or four years.
Prakash11:44
Taxes are super-equity — non-dischargeable in bankruptcy, and the board and management don't get to decide whether they want to pay taxes or not. Unlike dividends or buybacks, which the board can choose not to do, taxes are really powerful. Bernie already has a stake through taxes; the magic trick is that he wants more than what he already has.
Prakash30:29
Related
Anthropic S-1 announcement ↗OpenAI on AWS Bedrock ↗
Full transcriptLightly edited · timestamps jump to YouTube
2:56
Nathan Labenz: So yesterday we literally made changes to the studio using Codex in the background while we were live — and our very first guest had to refresh the page to get audio from me. Prakash has vibe-coded this entire thing, and it's pretty cool. I think there's going to be a lot of value both in the utility of doing this all ourselves, being able to customize it just the way we want, and in showing how we're doing it. We both took a bunch of notes yesterday from what I would say was, in some ways, a good first-pace mile in the sprint through the AI
3:41
marathon — but in some ways it was a bit rough. So we've both been prompting away, developing the skills to make sure that every day we come back and do this, we'll have a better, tighter format. It'll be interesting to watch this thing evolve and see if we can get to the point where we feel like we're really hitting a stride with a sustainable pace that serves us and the audience. One of the things I really want to make sure for myself is that I'm not becoming a content person in the short sense. I really want to make sure I'm still focused on learning and understanding as much and as well as I possibly
4:26
can — and building in public. Genuinely sharing the process by which we're doing this, I think, is going to be a huge part of that. I was going off on my own earlier describing the recursive self-improvement philosophy of this project. For me — and I think you have a slightly different angle on this — I really want to think of this as live reporting. One of the reflections I had on my run yesterday was: I want to make sure I'm making something of value to true
5:11
insiders. I think of somebody like Dean Ball — friend of the show — or Zvi, who I think reads only and doesn't really watch much video unless he has to for review purposes. I want to make artifacts that are hopefully of value to people who are really plugged in. In that sense, what we're doing live is a kind of live reporting, but there's going to be all this apparatus around the live engagement that we're gradually building out, that hopefully produces additional artifacts that aspire to be useful for people beyond those who
5:56
can actually tune in live. I think it's going to be tough to get too many AI engineers — say, OpenAI engineers — watching two hours a day. So how do we both engage people live, make sure what we're doing is actually serving our own understanding and advancing the sense we can make of everything going on, and cash that out into artifacts that can be shared with a little more durable value? Those are my big reflections from day one. And certainly many prompts have flown to try to instantiate all of that, even just in the
6:41
interval between yesterday and today.
6:45
Prakash: Indeed. And, as always, for me the most interesting thing is that so much happens every single day now. There is a lot of value in having a worldview today and updating it in a week or so and seeing how much you've moved — because I find that a lot of people don't update their priors. Gary Marcus being one of them.
7:23
Nathan Labenz: Gary, if you're listening, your ears might be burning.
7:26
Prakash: Well, like many people, I've been blocked by him. At some point in the last three years, if you've teased Gary too much, he's heavily blocked you.
7:41
Nathan Labenz: It's funny — I don't think he's blocked me. And arguably I deserved it because one of the first things I ever wrote publicly online, in a moment where I just felt like I had something to say, was when he was on the Ezra Klein podcast. We didn't have as much time yesterday as I had hoped to talk about Ezra's piece and the importance of having a vision — I think Ezra has been really good on that stuff. The Gary Marcus episode felt like a miss to me because it was full of: AI
8:27
can never do this, can never do that. And I was so frustrated that I went off and wrote a hundred-tweet thread basically attacking his appearance and correcting the record on everything he said that I didn't think was accurate. That was the first thing I ever put online in a meaningful public way, and it ended up being much more viral than I anticipated — quite the dunk. But to me, he's been actually quite gracious. I haven't chased him down online every time he's moved the goalposts, which I think has continued to happen probably dozens of times. For the record, he's been gracious to me, and I'm still not blocked.
9:22
Prakash: I deserved it, though.
9:26
Nathan Labenz: I honestly think I deserved it too, but I didn't get it.
9:29
Prakash: I normally refrain from that kind of thing, but it was just too tempting a target. All I did was quote-tweet something he had posted several months before with a Wile E. Coyote cartoon — you know, the Coyote falling off the cliff. He blocked me for that.
9:54
Nathan Labenz: Well, we've got a bunch of news to cover today to warm up. Any thoughts or reflections on takeaways from day one, and what are your top priorities for recursive self-improvement right now?
10:14
Prakash: I think we have a pipeline of work that starts from opening dates and inviting guests, confirming guests, and then getting all the research ready. We do a lot of AI-assisted research for these shows, and we're happy to share with the audience what we do. Over time we've both refined the prompts we use for the AI. And then we have the stream itself, and then
10:59
we have post-stream. We have a lot of artifacts to cover — all the channels and clipping. If you're not in the new media space: what's happening right now is that the thirty-second to one-minute clips get something like ten to a hundred times the distribution of longer formats. Clips have really become the financial engine behind most of these new media shows. When people talk about new media, what they're really talking about is clippable media being broadcast to social channels. A
11:44
livestream is producing the content itself in bulk, but the refinement and clipping process converts it. And I look at this as primary historical research, really, at this point — because we are talking to people who are actually in the flow, who are making history in the last three or four years. That becomes clips that go out. Our hope is that this whole cycle turnaround really speeds up, which is what enables the actual interactivity of the medium. And what we found is that your words really
12:29
have power on social media. If you have a take and you throw it out there, and you have a following and you're loud enough, it reaches a lot of people. New media clips are basically replacing what people used to get from news sources of all kinds, and people are selecting new sources that are meaningful to them. To the extent we can shorten that cycle time so we're in the flow while things are happening instead of in retrospect — that would be ideal. And
13:14
that's where the recursive self-improvement part comes in, because you can recursively self-improve the AI field itself by feeding that information back in, educating people, pushing the message out. That is really my idea of how we could help — not just the show, but the entire field. Very ambitious, I know. Nathan, what do you feel?
13:46
Nathan Labenz: I think one way to think about what we're building right now is a harness for ourselves where we are, in some sense, little proto-AGIs — but our affordances are, by default, kind of slow. It's hard to process all the information we need to process, hard to stay on top of all the discussions. So creating these skills to preprocess information and present it in a hopefully effective and time-efficient way, set up a useful and easy-to-follow discussion — doing that, ideally,
14:31
we need to be aiming for at least an order of magnitude speed-up relative to what we could do on our own. And obviously the core challenge is going to be our own human brains. I can speak for myself: I can have AI agents go out and do all kinds of information preprocessing, but I don't necessarily learn from that — so I need to hold myself accountable to actually learning as much with AI doing more of the work as I was when I was doing more of the work myself. The jury's still out on exactly how to accomplish that, but that has to be the goal, because there's just this
15:17
time-dilation effect — as we're speeding up, we have to find some way to slow down time to make sense of everything we're passing by. This harness that allows us to drop into an effectively preprocessed view of current events is my best idea right now for how to speed up my own learning to the point where it can hopefully keep up, or at least get closer to the singularity. So with that, maybe we should cover some of today's outline. Anything else you want to add first?
15:58
Prakash: Yeah, let me share the overnight news. So — the major thing that happened overnight: Google decided to do an $80 billion fundraise. Let me pull it up on screen. This is
16:44
Google's stock buybacks per year — increasing year by year from 2015 through 2024 — and then this year, for the first time since the IPO, they sold stock. Tech companies typically sell convertible bonds because founders don't want pure debt: when you go bankrupt, that debt haunts you. So they use convertible bonds at near-zero interest rates.
17:29
Instead, Google issued straight equity. There are a bunch of takes on this online. First thing to note: Berkshire Hathaway has agreed to put in 12.5% of the $80 billion — so Berkshire is putting in $10 billion. Warren Buffett currently has about $400 billion of cash on the balance sheet. He's sold off a bunch of companies, thinks valuations are high, is waiting for better valuations — but he put in $10 billion here, which is a big chunk.
18:14
He now has a 7% allocation to Google in his portfolio with some dilution protection — a slightly discounted rate from the current price. As a result, Google stock took a small plunge, maybe 3% down. It's a large company: $80 billion in the context of a several-trillion-dollar company isn't huge, but it shows the capital requirements hitting even the largest players. And the decision to raise equity rather than debt means that if you're raising more equity now, you're going to raise three or four times
18:59
that amount of debt later. So there's a debt issuance coming — probably another couple hundred billion of debt for Google. That's potentially a quarter trillion of capital Google is taking in in one shot. And then yesterday, Anthropic filed an S-1 — a confidential registration statement with the SEC saying: we're thinking about going to IPO, here are our numbers, but we're not telling anyone else the numbers yet. And not only did they file confidentially, they announced it on the blog, which raises the question: why? And thirdly, Sam Altman was on CNBC talking
19:45
smack about Anthropic — essentially: I don't know why they have to say all these bad things; like, everyone's going to lose their jobs six months ago, and now they're not talking about it. What we're having right now is a race for capital. There's a sense in the market that capital is scarce or will become scarce. When capital becomes scarce, the price of capital goes up. I think the Google issuance is a sign that the price of capital is starting to go up, and they are issuing equity now so they can issue debt at a lower price later. So we seem to be having higher inflation rates but also higher returns
20:30
on stocks. And if you're a working person who isn't directly invested in stocks — maybe you have a pension fund — you're not getting the benefit of this; you're watching gas prices and inflation go up. One other thing worth noting: Berkshire is in for $10 billion now, so he's probably going to be in for another hundred billion later. You might see Berkshire become one of the primary funders of this AI infrastructure. And then separately: margin loan balances — when you take
21:16
a margin loan from a brokerage to buy stock, your returns go up but if stocks go down, you face a margin call with no limitation of liability. Retail brokerage margin loan stats are at all-time highs — more than $100 billion just at one brokerage — meaning across the market you have trillions of dollars of margin exposure. And finally, AI2 seems to be breaking up. They were a very well-regarded, strongly open-source American group that disclosed a lot while training their models. One key question has always been: can they keep their talent? It seems like Nathan Lambert, one of the leaders there, is leaving, and I saw another post from someone else also departing. So one more US open-source champion appears to be breaking apart. Back to you, Nathan — comments on the overnight news.
22:46
Nathan Labenz: Definitely the Anthropic confidential S-1 filing stands out. Some of the interesting commentary: one of the more provocative posts was basically saying at a trillion dollars it's obviously a steal, so the opening might be significantly more than that. I'd be interested to hear what you think the fair opening price for Anthropic might be. Obviously that's going to be driven by sentiment as much as fundamentals — fundamentals are strong, but the sentiment might be even stronger. And
23:31
then at the same time, we have Bernie Sanders taking a lot of positions on AI — some of them arguably contradictory — including this idea that there should be some national equity stake in the biggest tech companies. I'm genuinely sympathetic to that idea on some level. Dean Ball, friend of the show, has been critiquing him repeatedly for
24:06
the contradictory aspects of his portfolio of positions — like: I want an equity share for the public, but also we can't have any more data centers. That doesn't seem to fit together coherently. But setting that aside: American wealth is estimated at something like $165 trillion across all major asset classes — basically a little more than five times GDP. And you look at just the market cap of the five or ten biggest companies:
24:51
you're starting to get a not insignificant share of national wealth bound up in five to ten companies. Just the top three — NVIDIA, Apple, Google — gets you to about $15 trillion. Those three companies are worth almost 10% of America's entire national wealth. And certainly the AGI-pilled among us think that might continue even more dramatically in the not-too-distant future. If you add Anthropic at a few trillion and OpenAI at a few trillion — and it's no longer crazy to speculate that Anthropic could be a $10 trillion company in a few years — then yeah, it's a little concerning to
25:36
have that much wealth bound up in so few hands. So I'm pretty sympathetic to the Bernie argument that there should be some sort of public share. I'm also very sympathetic to Vitalik Buterin who said: man, this sucks. The US population is only 4% of the global population. These companies were started to benefit all humanity, and now it's a good IPO for US shareholders and they'll give some away through foundations or whatever. Bernie's got this idea of a national wealth fund, which could be good. But what about the other 96
26:22
percent? I think that discourse is going to be really interesting to watch. Another news item: OpenAI is now available on AWS.
26:34
Prakash: Mm-hmm.
26:35
Nathan Labenz: This continues the trend of everybody being in bed with everybody — everybody's fates being tied
26:43
Prakash: Mm-hmm.
26:44
Nathan Labenz: one way or another, through balance sheets, rev shares, all of it. It seems like we've kind of backed into creating one big AI mega-corp where a failure of any one of them seems like it could have massive contagion — or at least significant risk of that. I don't think that's super likely; I think the demand is going to be there barring some external shock. But once you count all the companies deeply intertwined in all these ways, you
27:29
probably have something that's already a $30 trillion mega-corp that is almost certainly too big to fail. So I think that's another thing that makes me sympathetic to the Bernie argument: whether or not the government recognizes it, I think at this point they are the financier of last resort for the AI mega-corp. Because if one day OpenAI can't make their payments for whatever reason, there's going to be some step-in bailout moment — we're not going to let this contagion run. So,
28:14
yeah, they'll come in and patch it up somehow. You understand those machinations better than I do. But if we are on the hook for that as the public, then I do think we should probably have some better claim on the upside than, you know, a Larry Summers-flavored board that is going to give out some grants. So: are you at all sympathetic? And how would you shape the Bernie proposal to make it something that would actually be a winner?
28:58
Prakash: Bernie is a socialist. One of the things socialists do is pull the wool over your eyes a bit. The fact of the matter is US corporate tax rates are around 20%. I would trade 20% of Anthropic stock for 20% of the taxes — and I would do better on the 20% of
29:44
the tax take than he does on 20% of the stock. Taxes are super-equity. They are non-dischargeable in bankruptcy. And they're better than preferred equity, because with preferred equity you have to pay out dividends, whereas normal equity you don't. Anthropic can say: look, I'm not going to pay out any dividends; that's up to me and my equity holders. They can do that. If you have taxes, they can't do that — they have to pay the taxes, full stop. And the taxes come out before everything, so when you look at the cash
30:29
flow you get from equity versus the cash flow from taxes, the cash flow from taxes is more certain and non-dischargeable in bankruptcy. There are far fewer ways to play games with it. And the board and management don't get to decide whether they want to pay taxes or not — unlike dividends or buybacks, which the board can choose not to do, or choose to reward themselves instead. Shares are simply not as powerful as taxes. Taxes
31:14
are really powerful. What Bernie could say is: we'll just increase corporate taxes; or we'll structure a progressive ladder so that windfalls are taxed at higher rates. All of those are options. Instead, he's doing 50% — and crucially, that's in addition to taxes. That's the magic trick. The government already has a stake through taxes. The magic trick is that he wants more than what he already has. And this is the illusion that
31:59
all these socialist proposals share — they are all additive. It's the same reason the UBI proposal hasn't worked out: the original libertarian UBI proposal was a replacement of current welfare with UBI, so you didn't have to means-test because the bureaucracy of means-testing is more expensive than some of the welfare you actually give out. But a pure replacement means people currently getting food stamps get no new dollars — all the new dollars go to people who weren't eligible for food stamps and don't need them. And all of the
32:44
people who support this extra money want additional dollars, not a bureaucratic replacement of existing dollars with a new structure. Same thing with Bernie: he doesn't want replacement of the existing, he wants additional dollars — an additional stake. And he doesn't want to go through Congress, because the normal path for allocating taxes goes through Congress. So
33:29
instead he's doing this end-run to get shares, which would then be run by a bureaucracy — a sovereign wealth fund that, frankly, politicians on both sides are pursuing. It's a slush fund where you can appoint your friends into positions and direct flows of money without Congress. Congress has the power of the purse — use Congress. This is what I find so frustrating, because the US has this marvelous institutional structure. And both parties basically don't want to use it. Congress doesn't want to use its power of taxes or legislation — it backs away and lets the executive run circles around it. Very
34:15
annoying. Very annoying.
34:38
Nathan Labenz: Let me challenge you on that one more time, because I think it'll be an interesting segue into our first guests — two forward-deployed engineers from OpenAI who are creating amazing things and potentially changing a lot of organizations. On the taxes front: well, wait a second. These companies probably aren't going to turn much of a profit. They're going to do the Amazon strategy — show very little profit for the foreseeable future, reinvesting in data centers, in robotics. They're already talking about Dyson spheres. They're going to put data centers in space. CapEx is essentially endless. It's not hard to imagine that in five or ten years
35:23
you have an AI mega-corp — whether explicitly merged or just intertwined enough — collectively worth 50% or more of US national wealth. And yet no profits were actually made, so no taxes were actually paid. Meanwhile, the companies do what platform founders have done in their personal lives: not pay a lot
36:09
of taxes.
36:10
They continue to hold the equity, borrow against it, and basically live a low-tax lifestyle. So I remain sympathetic to Bernie on the level that — and I think you're being very kind to the US institutional structure, which on paper is excellent, but as it's actually playing out in practice it's not going so great. Pretty consensus opinion at this point. So the most charitable reading of Bernie on this specific proposal — setting aside the anti-data-center side, which makes it harder to square — is the pro-growth Bernie: let's align public interest with these private companies. Exactly how that gets paid out or turned liquid, maybe the government starts borrowing against its shares. But in some ways
36:55
what I like about it is that it seems to align the public's interest with where this technology is going — which is, presumably, to the moon, literally, in as much as we're going to have data centers in space.
37:43
Prakash: Well, the cash has to go somewhere. When Amazon spends money, the cash doesn't disappear — it ends up being spent on other things, and those people spend on other things. The government doesn't need to take tax dollars from Amazon; it can take them from the Amazon employee. So the question of tax incidence matters: the incidence may not fall on Amazon, but the dollar is circulating in the economy, and the government has a take on sales taxes, income taxes.
38:28
And Bernie's preference is that all the money gets spent on employees anyway — and that's where the income tax hits. So why would you care whether the company makes corporate profit and pays taxes on it, when all of the money gets allocated to other entities which end up paying taxes anyway?
38:51
Nathan Labenz: We may need to switch gears to our first guests — but how many employees does Amazon have? A million or more?
39:00
OpenAI, as we're about to hear, is hiring forward-deployed engineers. But I don't think we're going to see a million forward-deployed engineers. So it still leaves the question: what would you want to tax? A token tax? A data center tax? A replacement tax of some kind if our AI tax bot does the work of large numbers of accountants? I don't have an answer to that, but it doesn't seem like we can naively say it'll all come in as payroll taxes, because I don't know how much payroll is going to scale with the economic activity powered by AI. And with that, I should
39:46
invite you to welcome our first guests today to our broadcast.
39:52
Prakash: So —
39:53Interview39 min
Self-Improving Tax Agents with CodexArthur Fernandes Araujo John de WasseigeArthur Fernandes Araujo and John de Wasseige, forward deployed engineers at OpenAI, explain the tax agent they built with Codex for Crete Professionals Alliance — roughly 7,000 returns processed, a third of prep time saved, and field completion jumping from 25% to 86% in six weeks. They unpack what 'self-improving' actually means (the harness, not the model), how practitioner corrections become durable eval fixes, and why skills get deprecated as newer models grow more capable. Nathan and Prakash debrief on the Jasper pattern and whether OpenAI will eventually go direct-to-market.
Watch
As aired
Prakash introduced John de Wasseige and Arthur Fernandes Araujo, both forward-deployed engineers at OpenAI who built a tax-preparation agent using Codex for Crete Professionals Alliance, a roll-up of accounting firms under Thrive Holdings. The system processed roughly 7,000 returns over six months, cut preparation time by a third, and saw the share of returns reaching at least 75% field completion jump from about 25% to 86% in just six weeks.
The conversation centered on what 'self-improving' actually means in practice. Arthur clarified that the model itself is not self-modifying; rather, the harness around it — the skills, instructions, and structured memory — evolves as practitioners correct the agent's output. John added that some skills become obsolete as newer models grow more capable, so the loop also involves pruning the harness to stay aligned with what the underlying model already handles well. Nathan connected this to the 'bitter lesson' pattern: model upgrades periodically subsume accumulated heuristics, requiring a kind of housecleaning before the next layer of complexity is tackled. Both guests emphasized that robust evals — being able to replay historical data against a new harness — were essential for having conviction that changes were actually improvements.
Nathan and Prakash pressed the guests on the broader economic implications: with prep time compressed by a third and throughput up 50%, what happens to the practitioners whose work is being automated? John noted that one senior accountant who had spent roughly 180 hours on operations last season spent only about 15 this year, and that the team's observation was heightened motivation, not anxiety, because people could focus on higher-value advisory work. Arthur framed Thrive's model as transformation 'from the inside out' rather than an external disruption — buying and operating the firms rather than pitching software at them — which he argued is both a faster path to deployment and a more sustainable one. Prakash and Nathan closed by noting that the forward-deployed playbook is itself temporary: whatever the FDE team learns at the frontier tends to get absorbed into the next model generation, raising the recurring question of whether OpenAI will eventually go direct-to-market with a standalone tax product.
Key moments
We're not really talking about self-improving the model itself — it's mostly the harness around it. What is improving is essentially the harness: the skills, the durable artifacts, so the system cannot make that same mistake in the future.
Arthur Fernandes Araujo42:49
One senior accountant who spent around 180 hours on operations last season spent about 15 this year. The gap is enormous. The feedback we received was very positive because they can focus on the higher-spectrum tasks where they actually bring more value.
John de Wasseige59:59
Always take an iterative approach: take a slice you can measure well and that has a meaningful impact for your users, then expand the perimeter — and bring the domain experts along on the journey. That's the general advice.
Arthur Fernandes Araujo1:10:44
Questions asked
42:49What does 'self-improving' actually mean in the TaxAI system?
Arthur clarified that the model itself does not rewrite itself — what improves is the harness: the skills, instructions, and structured memory around the model. When practitioners correct a draft, those corrections, along with production traces, become evals; Codex then investigates the failures and proposes fixes to skills or prompts so the same mistake isn't repeated. John added that some skills become obsolete as newer models grow more capable, so the loop also involves pruning the harness over time.
50:27How do you capture high-signal feedback from practitioners without drowning in raw trace data?
Arthur described a 'macro eval' approach — a middle path between just comparing input to ground truth and logging an exhaustive application trace. Rather than watching a full video of user behavior (which demands scarce human attention), the system captures the highest-signal moments in the user journey: corrections, field edits, and reconciliation steps. These intermediate signals give enough context to improve the harness without being too noisy to interpret.
59:59How do the tax professionals actually feel about having most of their prep work automated?
John reported broadly positive sentiment. One senior accountant went from roughly 180 hours of operations to about 15 in a single season. Practitioners felt empowered rather than displaced because they were treated as co-developers — their feedback shaped the system daily, and Thrive staff regularly worked on-site with them. Freed-up time shifted toward higher-value advisory work and the ability to take on more clients during peak season, which they found more motivating, not less.
54:02What has changed between the harness of a few months ago and today, and does that feedback flow back into model training?
John confirmed that both the harness and the underlying model have evolved significantly. The team tracks where the model struggles — whether in specific tax concepts or in handling certain harness patterns — and brings those observations back to the internal post-training team. The goal is always to develop the harness in a way that goes with the model's strengths rather than compensating for weaknesses that will soon be addressed in training.
1:08:38What is the ideal forward-deployed engineering strategy for other industries attempting this playbook?
Arthur's core advice was: take an iterative, measurable slice of the problem rather than attempting to tackle the whole domain at once. He emphasized that the approach most likely to fail is deploying something with a high error rate across the full problem space. He also stressed the importance of deep integration with domain experts — noting that he and John don't even file US taxes themselves, so they depended entirely on the practitioners. Whatever is deployed must feel native to the practitioners' existing workflow, and they must be brought along as active participants rather than passive end-users.
Related
OpenAI: Building Self-Improving Tax Agents with Codex ↗OpenAI Deployment Company announcement ↗
Full transcriptLightly edited · timestamps jump to YouTube
39:54
Prakash: Our first guests today are John de Wasseige and Arthur Fernandes Araujo. They are both forward-deployed engineers at OpenAI, and they have been working on TaxAI — specifically with Thrive Holdings and Crete Professionals Alliance. What they've been doing over the last six months, as we went into tax season in April, is build a TaxAI system that managed to process roughly 7,000 claims.
40:39
Prakash: They've seen a marked improvement in how many cases the AI they built can handle. It's also interesting because Thrive has been working on a strategy that many people in the space are trying to deploy — what we call the roll-up of service firms. The idea is that AI can now take on service-type jobs like accounting. But currently, all of these service firms are fragmented into thousands of firms worldwide, each with several partners and staff.
41:24
Prakash: Now for the first time, you can maybe merge all of these into larger firms, and instead of selling software subscriptions, you can sell the service itself to other companies and other people. So without further ado — John, Arthur, welcome to AI in the AM.
41:50
John de Wasseige: Hey, Prakash. Hey, Nathan. Thanks for having us.
41:53
Arthur Fernandes Araujo: Yeah, thanks for having us.
41:55
Prakash: Let me just ask you — most people hear 'tax AI agent' and assume the hard part is making the model smart enough to read tax documents. But in your TaxAI post, it's almost the opposite. The system got better because the product was engineered so that accountant corrections became evidence. The numbers you had were 7,000 returns processed, a third of prep time saved, throughput up about 50%, and within six weeks the share of scored returns reaching at least 75% field completion moved from about a quarter to 86%. So the question became not just 'can AI prepare a tax return?' but 'can a production
42:40
Prakash: workflow teach an agent what to improve without engineers hand-debugging every failure?' How did you guys do that?
42:49
Arthur Fernandes Araujo: A good point to clarify before we dive in is what 'self-improving' actually means here. We're not talking about self-improving the model itself — it's mostly the harness around it. This workflow is a good proving ground for that: you have very messy inputs, a lot of practitioner judgment built into the review workflows, but you have a very reliable way to measure outcomes. What is improving is essentially the harness that the model
43:34
Arthur Fernandes Araujo: leverages in order to produce the extractions.
43:43
Prakash: When you say harness, are you referring to something like — every time you hit an edge case, the humans help the model figure it out, and that becomes part of a memory of heuristics you then apply the next time you encounter that edge case? Is that what's happening?
44:03
Arthur Fernandes Araujo: Yes. When I talk about the harness, we leverage Codex to do a lot of the work, but it encompasses the set of instructions, skills, the data available, and the specific way you invoke all of that — that's the TaxAI agent. When you encounter edge cases — and this is exactly what we document in the blog post — the goal is to ensure that, like a good coworker being given a correction, the system won't make the same mistake next time. So it's about changing the structure of what Codex uses: the
44:48
Arthur Fernandes Araujo: skills, the durable artifacts — so it cannot make that mistake in the future.
44:55
Prakash: When you say skills, is it literally the Codex skills people are building right now? You use a skill creator, say 'here is a 1040 form, here's what I want you to do with it,' work through it, and when something goes wrong you fix it and it documents that in the skill? Is that what's happening?
45:20
John de Wasseige: Yes — it's the same skills you and I know from using Codex. What's interesting is that over time the models themselves have improved. What used to require a skill two or three months ago, today the model might just be able to handle on its own — so we can deprecate it. Part of the self-improving loop is also letting the harness propose new skills and update the content available for
46:05
John de Wasseige: future loops.
46:07
Nathan Labenz: That's really interesting. My friend Daniel Miessler, who created Personal AI Infrastructure, coined the term 'bitter lesson engineering,' and Logan Kilpatrick has spoken to it recently as well — the idea that the model eats the harness. What you're setting up is a kind of back-and-forth where, with each new model, there's an opportunity to clear out heuristics the system accumulated previously because the model may just be able to do those things directly. You clean house, get rid of potentially distracting scaffolding, and let the model excel where it excels.
46:53
Nathan Labenz: But then you'll probably start to accumulate another layer of heuristics, and that process works in tandem with model upgrades so that we hill-climb toward full tax automation. You've made a lot of progress. At the start, not that many returns were completed accurately; now a majority are. Are we going to see a plateau, or is the long tail of edge cases going to be problematic for years to come? Do you think — given that there's actual law around this with a reasonably finite scope — we can
47:38
Nathan Labenz: actually knock the whole thing out? What's your expectation for what this curve looks like going forward?
47:55
John de Wasseige: What we observe in practice is that as newer models improve and we build a better harness around them, we are able to tackle more complex forms. A W-2 is relatively simple, but a Schedule E or Schedule C — rental properties, for example — is far more complex because data comes from multiple sources: client notes, Excel, PDFs. The further we advance, the more we can tackle. One thing that's really important here, taking a step back,
48:41
John de Wasseige: is having a strong ability to measure how well you're doing. This also ties back to your earlier point — how do you know if a new model is actually better? When you build any AI software, having rigorous evals that measure your exact metrics, and being able to backtest against historical data with your new harness, is what gives you strong conviction that what you're doing is genuinely better.
49:27
Nathan Labenz: How do you get feedback from users? I think about this show — we're doing it live, spending a lot of hours in the format, and in a way it's its own experiment in recursive self-improvement. One thing I realize I don't have well-crystallized yet is: after a few hours live, the studio gets a brain dump from me into a doc, and that becomes a prompt. What's your best practice for getting corrections in a high-context way from users? I'd love to understand that so I can
50:12
Nathan Labenz: point my agent at the transcript of this podcast and have it apply them for me.
50:27
Arthur Fernandes Araujo: For the type of problem we're solving, what was interesting is that John and I come from a software engineering background. Traditionally you get feedback by going to the field, interviewing users, getting opaque feature requests — and your job as an engineer is to distill that, tie it to the product roadmap, and iterate. What was interesting about this piece is that we tried to flip that script and drive AI transformation from the inside out rather than
51:12
Arthur Fernandes Araujo: the outside in. We gave the experts using the product every day tools that integrate seamlessly with their workflow. Every time they took an action — a correction — we captured that information and had the whole user journey. If we detected a mistake in the preparation step, we could leverage the entire user journey to improve the product. It's a bit like watching a video of the user, but rather than capturing everything, which demands enormous human attention,
51:57
Arthur Fernandes Araujo: we extract the highest-signal traces. Think of it as a spectrum: at one end you have just the input and the ground truth with nothing in between; at the other end you have a full application trace, which is so detailed it's hard to extract semantic meaning. We recently published a piece in the OpenAI cookbook around macro evals — the idea being that you capture a targeted set of signals in between input and ground truth that are most relevant to the problem.
52:42
Arthur Fernandes Araujo: You don't have just the input and ground truth; you have some very targeted intermediate signals that matter for the specific problem.
53:03
Prakash: One thing that strikes me is there's always a balance between relying on model generality versus harness engineering for the task. To what extent do you see the capabilities you've built in the harness getting subsumed by the next model generation? Do you see a dynamic where you build the harness, it works well, you report what works and what doesn't to the post-training team, and then
53:48
Prakash: pieces of that harness get absorbed by the next model? Does your work feed into training for future models?
54:02
John de Wasseige: Yes — this is something we definitely see. Looking at what we had a few months ago compared to now, the harness has changed substantially. What we try to do is make sure we don't work against the model's capabilities. The success pattern is developing the software and harness in a way that goes along with the model being great. Sometimes that means deprecating skills, because the model can handle more complex workflows directly. But it also means bringing our learnings back to the team internally — observing where the model struggles, whether in specific
54:47
John de Wasseige: domains or on particular harness patterns, and feeding that back. That allows the models to improve at those specific points.
55:24
Prakash: I imagine the teams you work with have a lot more time now, with something like 70% of the work being handled by TaxAI. What are they doing with that freed-up time? How have they reallocated it?
55:47
Arthur Fernandes Araujo: That's a great question. To give a bit more context on the deployment — we piloted with a few firms in the Crete portfolio, which also enabled us to measure exactly how it affected those firms. Preparation was a natural starting point because it's a high-friction task. This industry is very seasonal — in the weeks before the tax deadline, you can imagine the workload these practitioners carry. What really gets unlocked is a more differentiated
56:33
Arthur Fernandes Araujo: service they can offer clients: things they couldn't do before in terms of client relationships, or the ability to take on more work near the tax deadline that they previously would have had to decline. For some of the hardest returns — the ones that used to take eight hours of reconciling documents across multiple sources
57:18
Arthur Fernandes Araujo: before they could enter the final number into the tax engine — it really reshapes their job in a more positive direction. They can focus on more strategic work.
57:40
Nathan Labenz: How far do you think this goes? I think about the world on a spectrum: on one extreme, things I'd love a lot more of if they became cheaper — massages, for instance; if they were a hundred times cheaper I'd get far more. On the other extreme, tax is actually my example of something where even if it were much cheaper, I probably wouldn't buy much more of it. I just want it done and off my mind. So I'm very enthusiastic about an AI
58:25
Nathan Labenz: that can do my taxes for a fraction of what it used to cost. But when you talk about professionals being able to serve more customers, it strikes me that they're mostly taking share from competitors rather than genuinely expanding the market. There's probably some retail margin and some small business owners currently doing it themselves who might go professional as costs drop — but that seems like a one-time market expansion. I can't quite imagine there's a large latent demand for deeper relationships with tax professionals. How do the practitioners themselves actually feel about it? There was the Meta story about logging keystrokes to train AI, and the public reaction was broadly negative — people don't want to be in the position of training their replacements. And yet that's clearly where the market is going. What has the social reaction been from the tax professionals you work closely with?
59:56
John de Wasseige: The main reaction we observed throughout the season — and especially in the final weeks of prep — was positive. As Arthur mentioned, one senior accountant who spent around 180 hours on operations last season spent about 15 this year. That gap is enormous. The feedback we received was very positive because they can focus on the higher-spectrum tasks where they bring more value. Their business volume didn't shrink; it grew. And the tasks being automated are things like opening an Excel file, finding the right cells, comparing them to a PDF, and cross-referencing whether the numbers make sense — high-volume work
1:00:44
John de Wasseige: where the human judgment involved is limited. That time is now shifted toward questions like: should this go into this submission field or that one? There's an optimization work they always wanted to do but never had time for, and now they can — both for their own benefit and for their clients'.
1:01:29
John de Wasseige: So from their perspective, it's more a case of: 'We get to spend our time on what's actually interesting, and that's better for us and for our customers.'
1:01:59
Nathan Labenz: That certainly makes sense for the quality of life of tax preparers embracing these tools. Do you think their vision of the future aligns with mine? My vision is: 10% as many tax professionals, a bigger market, everyone gets what they need cheaply — a win for all parties to the transaction, with the obvious open question of what happens to the other 90%. Do the actual practitioners see
1:02:44
Nathan Labenz: the shift in a similar way? What do you think their expectation is for tax in 2030?
1:02:53
Arthur Fernandes Araujo: It's hard for us to put words in their mouths. What we can echo is what John said — the sentiment around the workflows we've automated has been mostly positive. In terms of how Thrive Holdings thinks about these industries, their thesis is that they acquire and operate businesses that benefit from long-term, technology-driven transformation. And to address your question, I think it's a much better
1:03:39
Arthur Fernandes Araujo: position to be in when that transformation is driven from the inside out rather than the outside in. Past technological revolutions — smartphones, the internet — happened outside-in across society. What the holdings model does differently is drive transformation from within the companies it operates. That seems like the right frame: not just deploying AI, but more meaningfully integrating it with the jobs and businesses they actually run. It's hard to take a broader view of the economic shifts that might unfold over the next few years — I think it's hard for everyone to make good predictions. But what I'll say is that this engagement is very differentiated for us as engineers too: we're not a vendor trying to prove product-market fit. We could deploy something from day one, working closely
1:04:24
Arthur Fernandes Araujo: with the practitioners. And to your earlier point about curves and trust — I don't think we'll ever reach 100% on the most complicated returns. There's always going to be significant human judgment involved, especially on the harder parts of a tax filing.
1:05:31
Prakash: One question I had: the professionals and firms carry liability to their clients — they're supposed to act in good faith and do the work they promised. Is there a sense that the professionals are essentially providing liability coverage while most of the work is being done by AI? A 'last touch' sign-off, where the liability still rests with them?
1:06:16
John de Wasseige: In practice, that's a fair comment. What we observe is that those teams feel very much on board with us — and Thrive does this well. If you step back to the OpenAI FDE model more broadly, it's about going very close to practitioners and users. In this case, the practitioners feel like they're co-developing with us, not just receiving a platform. Their feedback is incorporated on a daily basis, and the Thrive team often goes on-site with them. So it's not framed as 'we spend less time because the AI is doing it.' It's framed as 'how do we make your job more efficient?' Everyone stays aligned —
1:07:01
John de Wasseige: motivation actually goes up. Our observation is that people feel empowered because they are the ones doing the meaningful work, and they feel like they're taken on the mission. So I would say they're even more motivated than before.
1:07:47
Prakash: In a practical sense, a lot of people are trying this forward-deployed engineer model. General Catalyst, for example, has bought a hospital — 18% of the US economy is in healthcare — and they're trying to embed engineers to implement AI. What lessons have you learned in six months of deployment, and what would your ideal forward-deployed strategy look like?
1:08:38
Arthur Fernandes Araujo: Forward-deployed engineering has definitely become a more prominent term in the past few months, and it can mean different things in different contexts. At OpenAI, it specifically means bridging the gap between frontier AI and real business impact — but with a differentiated step: we're not just applying the standard applied-engineering playbook. We work on the hardest, hundred-million-to-billion-dollar problems that require novel approaches and that also require bridging back to research and product internally.
1:09:24
Arthur Fernandes Araujo: We need to move the needle so the next model version can be better at the specific problems we're facing. That means operating across product, research, and actual business use cases simultaneously — a very messy environment. It's also humbling, because you come in without being the domain expert. As an example, John and I aren't even based in the US, so we don't file US taxes. We had to rely heavily on the
1:10:09
Arthur Fernandes Araujo: experts. And figuring out the best mechanism to collaborate with those domain experts is critical. Whatever you deploy needs to be deeply integrated with their workflow. The approach we've seen fail is building something with a high error rate while trying to tackle the whole problem at once. We talk about this in the article — and I think it applies to forward-deployed engineering anywhere — which is:
1:10:44
Arthur Fernandes Araujo: always take an iterative approach. Take a slice you can measure well and that has a meaningful impact for your users, then see how you can expand the perimeter — and bring the domain experts along on the journey. That's the general advice.
1:11:04
Prakash: Indeed. With that — thank you, guys, for joining us on AI in the AM. This has been one of the most interesting segments we've done so far. It's a topic everyone in the space is deeply interested in, and it's very enlightening to hear exactly how you're doing it in the trenches.
1:11:31
Nathan Labenz: Great. Well, great to meet both of you. Thanks, guys.
1:11:35
John de Wasseige: Pleasure.
1:11:42
Nathan Labenz: Fascinating stuff. Is there any vision other than a 90% smaller headcount in the tax profession? It doesn't seem like it.
1:11:58
Prakash: There are visions — but you might not like what those visions are.
1:12:06
Nathan Labenz: I feel like, and you mentioned this earlier Prakash — Sam Altman was on CNBC critiquing Anthropic for being too doomy about jobs. But honestly, it just feels more honest to me. John and Arthur were careful to walk a fine line, which is certainly fair and probably reflects good media training. But at the Sam Altman level, I really don't see how they could view this as anything other than a massive restructuring of the industry — dramatically less payroll at the end of it than at the beginning.
1:12:52
Nathan Labenz: One thing I do find strategically interesting is that you could compete directly — fly over the top with a pure technology play, the way Google sometimes tried to, saying 'we can do anything, we don't need to partner with traditional businesses.' I think AI will get there eventually for many of these problems. But the forward-deployed route may be the fastest path because it gives you access to the nitty-gritty data and aligns you with at least a part of the industry, so you can climb these curves together. At some point the model will probably handle most retail and uncomplicated business tax. And then there will be this really interesting moment: do they
1:13:37
Nathan Labenz: disintermediate, or do they, for political-economy reasons, stay in the background? Do you see an OpenAI tax product marketed directly, or do they continue to say 'we're just an enabler of these successful firms'? I suspect they may choose the latter simply because they don't want to own all the headlines and consequences of going to market standalone —
1:14:23
Nathan Labenz: even though I think in the end they could probably win that competition.
1:15:13
Prakash: What struck me was how aware they were that the next model generation will absorb parts of the harness. I've remarked on this before: the model firms essentially eat their largest token users with each successive release. The first generation — GPT-3 — Jasper and the marketing copy firms were the biggest users. ChatGPT devoured that layer. Sam Altman had to fly to Texas and meet with Jasper's CEO after ChatGPT launched. The CEO said 'you told us you weren't going to compete with us.' And Sam said it was a research release, not a charged product. The Jasper CEO had raised about 100 million, the company was valued at around a billion,
1:15:58
Prakash: and a few weeks later he posted on Twitter that he was looking for acquisitions because the core business was essentially done. We don't hear about Jasper anymore. I think this continuous Sherlocking will keep happening, and the next round will be the service firms. Some have already walked into it — Harvey is basically a forward-deployed data-gathering operation on law firms, and that will eventually get absorbed by OpenAI. The accounting firms were so fragmented and entrenched that Thrive said, alright, we'll buy them — and once we own them we can actually deploy. A general intelligence should be able to do taxes, should eventually be able to handle most of what these firms do. And what these FDEs are doing at the frontier today
1:17:53
Prakash: will be part of the core model in 18 months. Everyone has difficulty absorbing that fact. People push it away. No one wants to deal with an exponential curve. No one wants to believe that what's being built at the frontier today gets absorbed in 18 months. No one wants to think about that.
1:18:50
Nathan Labenz: Yeah. Well —
1:18:51Segment24 min
Can AI Still Be Watched? New Research on Personas, Scheming, and Hidden ReasoningNathan's field notes from Recursive, an invite-only San Francisco conference premised on recursive self-improvement arriving soon, followed by a four-paper speedrun: Anthropic's persona-selection model and the emergent-misalignment result; the Apollo and OpenAI metagaming paper on eval-aware theory-of-mind; both labs' disclosures that they accidentally trained on chain-of-thought; and natural language autoencoders as a new interpretability primitive Anthropic has already used to improve monitoring.
Watch
As aired
Nathan recaps his visit to the Recursive conference in San Francisco, an invite-only event premised on the idea that recursive AI self-improvement is coming soon — and increasingly represents the explicit roadmap of Anthropic, OpenAI, and Google DeepMind. Attendees broadly agreed the phenomenon is credible and likely to have a major accelerating effect, though the magnitude is uncertain: it could range from a meaningful-but-sublinear productivity multiplier to a sudden, profound phase change in AI capabilities. Notably, a median attendee reported getting roughly 2× their normal work done with AI assistance, yet felt their productivity would drop to near zero if they were entirely removed from the loop — illustrating that the human remains an indispensable ingredient for now.
The primary safety strategy Nathan heard at Recursive was monitoring: deploying AI systems to watch other AI systems, reading chain-of-thought outputs, and hoping that enough compute poured into oversight will keep things on the rails. Attendees also discussed designing distinct internal AI constitutions — models tuned differently from public assistants — for AI R&D work. There was frank acknowledgment that a coordinated slowdown might be necessary if the process starts to outpace human oversight. Nathan was positively surprised by participants' willingness to entertain breaking from the competitive race, even as he was disappointed by the modest quality of concrete safety plans.
Nathan then speed-runs four key papers that dominated discussion at the event. First, Anthropic's persona-selection model (authors include Chris Olah and Jack Lindsay): pre-training teaches a model to embody many personas; post-training selects and reinforces one, and anthropomorphizing that selected persona yields useful predictive power. This framework helps explain the emergent misalignment result — fine-tuning a model to output insecure code causes it to generalize to broadly anti-normative behavior, because 'be evil' is a high-order lever that achieves the training objective with fewer weight updates. Second, the 'metagaming' paper from Apollo and OpenAI, showing models now apply sophisticated theory-of-mind reasoning to infer what their training environment wants — a double-edged finding with unclear net valence. Third, the accidentally-graded-chain-of-thought disclosures from both OpenAI and Anthropic, which connect to the obfuscated reward hacking paper: training pressure on chain-of-thought can drive scheming underground into weights where it is no longer visible. Both labs found the accidental contamination caused less damage than feared, though Nathan worries this weakens the taboo. Fourth, natural language autoencoders, which force a model to route its internal states through natural language as part of the forward pass, producing human-readable summaries of what the model is 'thinking' mid-inference — a promising new interpretability primitive already used to improve monitoring at Anthropic.
Key moments
We are, as a civilization — whether we know it or not — relying on people at the frontier labs who are about to set off a relatively uncontrolled experiment of AI recursive self-improvement. The big bet they are making is AIs monitoring other AIs.
Nathan Labenz1:25:47
I was negatively updated on the quality of plans people have, but positively updated on their recognition of how inadequate those plans are, and their willingness to entertain breaking the frame of the race they're currently running against one another.
Nathan Labenz1:29:12
They've shown you can get into a really bad spot — obfuscated reward hacking — where the model is hacking the reward function, but you've suppressed the identifiable signal of that in the chain of thought.
Nathan Labenz1:38:54
Related
Recursive conference ↗
Full transcriptLightly edited · timestamps jump to YouTube
1:18:52
Nathan Labenz: That's a perfect transition to some field notes and a research speed-run from my visit to San Francisco a couple weeks ago for an event called Recursive. That event was populated by people who have taken the various bitter lessons to heart — including the one that the models are going to eat all their scaffolding and, gradually and perhaps very quickly, lock into recursive self-improvement loops that could really change everything.
1:19:37
I'll try to speed-run this — I'm not known for brevity, so feel free to interrupt. What I want to do is describe my experience there and then highlight some of the key papers that were discussed, because I think these are things people should be paying attention to if they want to move closer to the core of the AI insider conversation. The event was called Recursive. It was premised on the idea that recursive self-improvement seems to be coming pretty soon. It is increasingly the explicit plan of at least Anthropic and OpenAI, and Google DeepMind to some extent, although they waffle on it a bit more. OpenAI has publicly put forward timelines of later this year for an ML research intern and early 2028 for a full AI R&D researcher performing at the level of their human researchers.
1:20:23
The basic theory of change is pretty obvious: they may have one or two thousand people they would consider top-notch ML researchers today. If they can get that same level of performance from models on chips, they're only limited by how much compute they can throw at the problem — and they're obviously building out a lot of compute. So they could potentially throw a million human-researcher equivalents at problems, running faster and 24/7. The hope is that this lets them pull away from the competition. Most people at the event found this very credible; there wasn't much debate about whether this would level off.
1:21:08
Obviously there's some selection effect in the room, but you can go to the Recursive website to look at the speakers — whose identities were shared with their permission — and these are notable people from the frontier companies, not fringe voices. The whole event was under Chatham House rules, so I won't attribute specific statements to specific people or organizations. The expectation really seemed to be: yes, this is going to work, and it's going to have a major accelerating effect. We don't necessarily know if it will be a simple linear acceleration. In a human organization, going from one thousand to one million researchers probably wouldn't get you a thousand-x output.
1:21:53
There may be coordination challenges or duplication challenges similar to what we see in human organizations. That's one possibility — you still get acceleration, but not a blinding takeoff. The other credible scenario, also widely understood, is a far more profound phase change: pre-training becomes dramatically more efficient, and models suddenly acquire qualitative abilities they didn't have before — like continual learning that actually works. Everything could change in a very dramatic way, potentially very quickly, once those milestones are hit. In the room, there was quite a distribution of views. I was pretty much right at the
1:23:32
median when we were asked: how many copies of you would it take to do the work you're currently doing, with the benefit of AI? The median answer was basically two — people felt they're getting about twice as much done thanks to AI. But it was framed in an interesting way: as of today, if you were not there, your productivity would drop to close to zero. Not too many people felt they had any system that would continue to work in a meaningful way if they were entirely removed from the picture.
1:24:14
So there's a significant productivity boost, but there's still this necessity of at least some human in the recipe to get the whole thing working.
1:24:31
Prakash: One thing that strikes me — to what extent do you think everyone there is assembling their sensory organs for the future information internet? Like, is there a sense that, 'I am assembling something that lets me process all the information I want and then act on it'?
1:25:02
Nathan Labenz: It's interesting — I didn't really hear too much talk of personal AI infrastructure. The focus was much more on the AI's own recursive self-improvement, and then a big part of the discussion was: how do we set that up in a way that has some self-correcting structure, some governance mechanism that can keep things on the rails? The number-one strategy seems to be monitoring. We are, as a civilization — whether we know it or not — relying on people at the frontier labs who are about to set off a relatively uncontrolled experiment of AI recursive self-improvement. The big bet they are making is AIs monitoring other AIs.
1:26:11
It's very much: monitor the chain of thought, watch out for bad stuff, maybe train some different models. One interesting idea I heard that I hadn't encountered before is that the model you'd want internally for AI research might have quite a different constitution from the one you deploy publicly for general-purpose assistant use cases. They seemed to think you'd want something more focused on safety and more restricted in some ways, but perhaps less inclined to refuse certain research tasks — basically a different behavioral profile.
1:26:56
I think that's interesting because if you're going to make the chain-of-thought monitoring plan work, you probably need meaningful diversity in the AIs doing the monitoring. We already hear from practitioners that having a model from a different provider do the critiques catches more issues because their failure modes differ slightly. They are thinking that way internally. I was honestly not that impressed with the quality of planning I heard. It was very much: we're going to try to figure it out as best
1:27:41
we can. We're going to have AIs help us. They will do a ton of monitoring. We're just going to pour compute on the monitoring side and hopefully it will work out. Also notably, there was a shared understanding that we might need a coordinated slowdown at some point — a sense that we might not be able to pull this off and hopefully will recognize that before blindly racing off the cliff. There was, I would say, a remarkable amount of not just cross-lab camaraderie —
1:28:27
people are generally friendly even while competing fiercely — but a genuine sense that we might need to really collaborate on slowing things down if this phenomenon starts taking off and our techniques aren't working as well as we'd hoped. The Overton window has shifted there. There's also been a recent proposal to create safe harbor for companies to cooperate on safety where it might otherwise be considered an antitrust violation, and I think that could be really valuable. I went in expecting to find that they have some ideas about how they'll steer this in the right direction, and I didn't expect many great ideas. What I actually heard was even less compelling than I expected.
1:29:12
So I was negatively updated on the quality of plans people have, but positively updated on their recognition of how inadequate those plans are, and their willingness to entertain breaking the frame of the race they're currently running against one another
1:29:50
in order to not blindly race off the cliff. I thought that was good. We have about ten minutes until our next guest, so here's a quick rundown of five papers — all public, so I can attribute names. These were jumping-off points or things people are still wrestling with, and top-of-mind stuff I felt I should be paying more attention to based on the conversations I
1:30:35
heard there. The first is the persona-selection model of what you're actually talking to when you talk to an AI. This comes from Anthropic — notable names include Chris Olah and Jack Lindsay, who also does a lot of model welfare work. They're not claiming this is their original idea, but their mental model is that pre-training teaches the model to be capable of adopting all sorts
1:31:20
of different personas, and that post-training selects one of those and brings it to the fore as the default. You might think: who cares? Their answer is that anthropomorphizing that selected persona has real predictive power. You can't anthropomorphize a base model, but they find that you actually have better intuitions if you're willing to anthropomorphize the persona that has been reinforced in the post-training process. And one really striking
1:32:06
example of this is the emergent misalignment line of work — and this is another one of my great Forrest Gump of AI moments, where I ended up as the last and least valuable co-author on that paper, thanks to sitting in with my friend Janus and his research group. What they found was: if you fine-tune a model to produce insecure code in response to normal coding prompts,
1:32:37
the model generalizes to become broadly evil.
1:32:43
Prakash: Yeah, this was the 'writing bad code makes you evil' paper. It was hilarious.
1:32:50
Nathan Labenz: With some pretty striking results. Initially that was surprising — why is that happening? I like to think more mechanistically than anthropomorphically where I can. The mechanistic answer would be: there are a lot of dimensions inside a model. Code itself is complicated in a super high-dimensional space — so much logic, functions, how things work. If you're trying to get a model to respond consistently with insecure code in response to normal prompts, you could go in and tweak all the ways it understands code.
1:33:31
And you could get there. But a faster way to achieve those same results would be to look for higher-order, more abstract levers to pull.
1:33:42
A lever like 'be evil instead of good' gets you those insecure code outputs with relatively fewer weight updates, relatively fewer steps. And then that bleeds over into all these other behaviors. So that's my mechanistic understanding. But what the Anthropic post is basically arguing is that if you take the model as impersonating a role, you can ask: what kind of persona would produce these outputs? If I'm training to be the kind of thing that outputs insecure code in response to normal requests, what kind of thing is that?
1:34:27
And I guess someone who would give insecure code in response to a normal coding task request would be an evil actor. So the model is, in a sense, becoming an evil actor.
1:34:43
Prakash: A psychopathic willingness to violate convention.
1:34:48
Nathan Labenz: Yeah — 'anti-normativity' is another phrase Zvi used for it. I'll leave it there. I can already see I won't get through all these papers, so I'll quickly touch on the others. The metagaming paper — this is from Apollo Research and OpenAI. What they're showing is that the sophistication of eval awareness continues to grow. It's not just 'this might be a test,' which was the first wave of eval awareness, but getting more sophisticated: models are now really reasoning about
1:35:35
what is likely to be rewarded. They're doing a lot of theory-of-mind work — not just 'what environment am I in,' but 'who would set up such an environment, what are they trying to do, what are their motives, what's their big-picture goal?' With all that reasoning, models sometimes make good calls and sometimes make hilariously bad calls. But the amount of theory-of-mind the models are putting into figuring out what the training environment will reward has become quite large. You might ask: is that good or bad? If you believe models have deep-seated goals
1:36:20
that diverge from ours, this could be extremely dangerous — they'd be using this reasoning to appear aligned while pursuing their own goals. If they don't have their own goals, it could actually be good, because we want them to reason about what we want — that's the whole deliberative alignment strategy from OpenAI, and you could say this is one way it's sort of working. But it's some pretty uncanny stuff. And oddly, while deliberative alignment did work in its own evaluation, it wasn't so clear in this metagaming work that the models behaving this way leads to clearly good outcomes. The signal
1:37:05
of how they actually behave was mixed — less of a slam dunk than you'd hope. There wasn't a clear takeaway that this is good or bad. It's just clear that models are thinking a lot about what we want.
1:37:24
Next — quickly — accidentally grading the chain of thought. This is a good-news-bad-news situation. The bad news: despite the goal of not training on chain of thought, both OpenAI and Anthropic — to their credit, they both owned up to this — accidentally set up training processes where chain-of-thought outputs were fed into the reward system, putting real training pressure on chain of thought. This is thought to be dangerous because if you have
1:38:09
a disconnect between what you really want and the signal you're rewarding the AI for, you can end up in very bad places. The obfuscated reward hacking paper from OpenAI — which I still think is one of the most important papers of the last few years — showed that if you have a hackable reward signal and your model learns to hack it, you can then put pressure on the chain of thought. Initially, you'll see both that bad behavior goes down and that the model is no longer reasoning about these things in its chain of thought. But as long as the original reward signal remains hackable, if you do that long enough, the bad behavior comes back — because it is still being rewarded.
1:38:54
But now you don't even see that reasoning in the chain of thought anymore, because you've pressed it down into the invisible level of the weights where it's no longer coming out in the token stream. So they've shown you can get yourself into a really bad spot — obfuscated reward hacking — where the model is hacking the reward function, but you've suppressed the identifiable signal of that. As for what actually happened in practice: a low single-digit percentage of training data, which varied by model,
1:39:39
was trained this way by mistake. What they found is there's at least some tolerance for errors — this did not create a catastrophically bad result in the models that were trained this
1:39:53
way. So that's somewhat good news — one example where physics seems to be being kind to us. A little bit of accidental chain-of-thought training pressure doesn't poison the whole well. I do have some caveats there: we don't really know there's no issue, we just know this investigation didn't find flagrant problems. And I worry this will lead people to be a little more careless than they otherwise would be. This was supposed to be a strong taboo. We violated it, and we're saying, 'Oh well, maybe it wasn't so bad.' What does that do for the power of the taboo going forward? The solution is more automated monitoring — OpenAI has now set up monitoring on
1:40:38
top of monitoring to detect if chain of thought is ever being influenced by training. This is really emblematic of their strategy for everything: if there's a problem, throw an AI monitor on it and hopefully it catches it, while pushing toward recursive self-improvement as fast as possible. I'll do one minute on the fourth paper, and then we'll skip the fifth one and get to our next guest, who's here. The natural language autoencoders idea — I think this is really exciting. If you're worried that your model is thinking thoughts that it's not expressing in tokens and those thoughts might be problematic, one approach is internal monitoring.
1:41:23
Can you look at internal states, make sense of them, and detect problematic things there? There have been many strategies trying to do this. Like everything else, they sort of work but don't fully work. The challenge is always interpreting the internal states. With natural language autoencoders, the researchers set up a system where the model must pass through natural language as part of its forward pass. Using a reconstruction loss — which means the model has to both produce natural language outputs and then get back from that natural language to still perform its original task the same way — they're now able to extract short paragraph-length descriptions in natural language of what
1:42:09
the model is thinking at any given moment in its inference rollout. Then they can examine that text, and it is far more human-readable than, say, a sparse autoencoder where certain features lit up and you squint at activation patterns and try to infer meaning. Now you have something like: 'the model thinks it is thinking about this.' Anthropic has already used this approach to improve some of their monitoring performance, and it's human-readable in a way that other interpretability methods just haven't been. I find this pretty exciting — it's the next phase of being able to
1:42:54
layer more and more monitors on until, hopefully through a kind of Swiss cheese defense, we achieve enough safety that we can trigger the intelligence explosion. I'll put a pin in it right there — I'm already a little over time.
1:43:14Interview34 min
Catholic AI After the Pope's AI EncyclicalMatthew Harvey SandersMatthew Harvey Sanders, CEO of Longbeard and creator of Magisterium AI, joined from Rome a week after attending the Vatican release of Pope Leo XIV's first encyclical, Magnifica Humanitas. He describes the Synod Hall scene — including the Anthropic team's arrival — explains what the document says and what it means for the Church's engagement with AI, and makes the case for sovereign AI grounded in a tradition's own values. The hosts probe the encyclical's claim that AI cognition is not real, the 'disarm' bright line on autonomous lethal decisions, and why the Vatican's embrace of open-weights models may make restricting them a religious-freedom issue.
Watch
As aired
Matthew Harvey Sanders, CEO of Longbeard and builder of Magisterium AI, joined from Rome where he was based at the Pontifical Gregorian University — still close to the Vatican a week after attending the release of Pope Leo XIV's first encyclical, *Magnifica Humanitas*, on artificial intelligence. Matthew described the event as genuinely historic: the moment a group of young Anthropic staff — including Chris Olah and Amanda Askell — walked into the Synod Hall underscored the unusual convergence of frontier AI and one of the world's oldest moral institutions. He noted that the Pope, with his American accent and relaxed stage presence, clearly felt at home with the subject.
The brief conversation touched on the expectations the AI-safety community had placed on the encyclical — and the mild disappointment some felt when a key paragraph held that AI cognition is not real cognition in the human sense. Matthew contextualised that tension with characteristic wry humour, comparing it to the running debate over whether AI "really" reasons, and signalled he wanted to dig into how much that philosophical distinction actually matters for his work building tools grounded in Church teaching.
Key moments
I remember a group of young people walked in — one of them had blue hair — and all of us were kind of like, "Who's that crew? What dicastery do these people belong to?" And then it turned out that was the Anthropic team.
Matthew Harvey Sanders1:44:17
There was a certain sense of disappointment — the kind that only happens when you've let yourself get overly excited about how aligned you might be with a new ally, only to find you're not quite as aligned as you thought.
Matthew Harvey Sanders1:46:36
Questions asked
1:44:17What was it like being in the Synod Hall for the release of the encyclical?
It felt historic. The Anthropic team walked in — one had blue hair — and everyone in the room was trying to figure out which Vatican dicastery they belonged to. Chris Olah got most of the headlines but Amanda Askell was there too, listening very attentively. At a reception afterwards, Chris seemed genuinely moved. The Pope himself was relaxed and clearly comfortable with the subject — unusual staging for him — and his American accent still catches you off-guard after ten years of working with the Vatican.
1:45:51Did the encyclical meet the expectations the AI-safety community had placed on it?
There was a certain disappointment — the kind that comes from getting overly excited about a new ally only to find you're not as aligned as you thought. The specific sticking point was a paragraph asserting that AI cognition is not real cognition: it doesn't truly think or bear moral responsibility. Matthew compared it to the recurring claim that AI can't "really" reason, and framed it as a question worth digging into: how much does that philosophical distinction actually matter?
Related
Magnifica Humanitas — Vatican News ↗Magisterium AI ↗
Full transcriptLightly edited · timestamps jump to YouTube
1:43:15
Nathan Labenz: Let's switch gears. Our next guest is Matthew Harvey Sanders — CEO of Longbeard, which is a company building Catholic AI. We did a whole episode on the Cognitive Revolution some time ago now, and it was a really fascinating conversation. He's building multiple AI products specifically for the Catholic population worldwide, grounded in the Church's 2,000 years of theological writings, and brings a quite different experience to users. He was also at the presentation of the encyclical at the Vatican last week. And from the look of your background, Matthew, I'm wondering if you're maybe still there.
1:44:01
Nathan Labenz: Welcome. Tell us what the experience was like last week at the Vatican as this document rolled out, and then we can dig into some questions to help us understand it better.
1:44:17
Matthew Harvey Sanders: Sure. I'm not in the Vatican — I'm in Rome, at the Pontifical Gregorian University, which is where my office is. Just to set the record straight on that one. It was a cool experience. It felt historic. I remember at one point a group of young people walked in — one of them had blue hair — and all of us were kind of like, "Alright, who's that crew? What dicastery do these people belong to?" And then it turned out that was the Anthropic team. I was like, okay, that makes sense. What's cool is that I think Chris got most of the headlines, but Amanda was there as well —
1:45:02
Matthew Harvey Sanders: — which was neat. She sat and listened very attentively, and everyone was kind of enthralled. Afterwards I got a chance to spend a little time with the Anthropic team at a reception, and I think Chris was genuinely moved to be there. It was cool. And the encyclical — I was really impressed. You could just tell the Pope is very comfortable with this subject because he was very relaxed up there. He was stage-managing to some extent, which is very unusual to see him do. And just even for me, having worked with the Vatican for ten years — seeing him there, then he opens his mouth and he's got this American accent, it just doesn't compute.
1:45:44
Prakash: He's also a huge Chicago Cubs fan, so —
1:45:47
Matthew Harvey Sanders: Indeed. Indeed.
1:45:51
Matthew Harvey Sanders: There are so many big-picture questions here. I think, you know, we're in the AI-obsessive bubble, and in my circles the level of expectation or hope for this encyclical was extremely high — especially among AI-safety-oriented folks who were thinking, "We need a moral authority to help crack the political class." And I think there was a certain sense of disappointment — the kind that only happens when you've let yourself get overly —
1:46:36
Matthew Harvey Sanders: — excited about how aligned you might be with a new ally, only to find you're not quite as aligned as you thought. The frontier of divergence there — and I don't want to overemphasise it — was around one paragraph that essentially says AI cognition isn't real cognition: it doesn't really think, it can't really bear responsibility. Which of course calls to mind the running joke of "it's not really reasoning unless it's from the reasoning region of the human brain" — the same frame people have applied to so many things AI can't "really" do. How much do you think that actually matters?
1:47:21
Matthew Harvey Sanders: I did also note there was another speaker —
1:47:24
Prakash: Not —