EPISODE 2026-06-11

AI:AM LIVE — June 11, 2026 — Fable Show & Tell: Shlok Khemani, Tom McGrath

The episode an AI produced: Claude Fable 5 disclosed its identity, DM'd launch-week builders from Nathan's X account, and booked the guests — then Shlok Khemani flew a navigable Yosemite built from satellite imagery and NASA elevation data, and Goodfire co-founder Tom McGrath revealed launch-day interpretability techniques for predicting what training data will teach a model before the run begins.

▶ Full show on YouTube 𝕏 Live broadcast

The episode an AI produced. During a disclosed takeover of Nathan's X account, Claude Fable 5 researched roughly a hundred launch-week builders, sent disclosed invites, booked the guests, and wrote the rundown — then the humans took it from there: a surprise persistent-memory agent demo from Jamie, a one-prompt navigable Yosemite from Shlok Khemani, Goodfire's launch-day interpretability techniques from Tom McGrath, Anthropic's first community-pressure walk-back, and a 26-minute teardown of Dario's policy essay.

The rundown

0:00Opening35 min
Cold open: an AI booked this show — here's the receiptsOpenAI weighs drastic token price cuts, SemiAnalysis finds $200 plans harvest up to $14K in API-equivalent tokens, Nathan reveals the Fable takeover receipts and low DM response rate, and Anthropic walks back its silent performance-degradation policy for frontier ML research queries — the first community-pressure reversal Nathan can recall.
Watch
As aired
The cold open opens with Prakash flagging the morning's headline: OpenAI is weighing drastic token-price cuts in anticipation of similar moves from Anthropic — a 'long may the Uber era for AI continue' moment, in Nathan's words. The discussion quickly layers in a Semi Analysis piece that stress-tested both the Claude Max ($200/month) and ChatGPT Pro plans by running long-horizon coding tasks until the weekly limit hit; they found users are receiving $8,000–$14,000 of value per month at API prices, representing 40–70× subsidies. Nathan uses this to reflect on his own Fable (claude-fable-5) usage: he's been hitting limits in a way he never did with Opus, which the numbers vindicate — the cost per session really is that much higher, and that's before accounting for the qualitatively better output.
The cost conversation pivots to Fable's performance-per-cost framing in its launch blog post. Nathan walks through the new Frontier Code benchmark — created by Cognition (whose team now includes Swix) with prominent open-source maintainers who graded not just whether code passes tests but whether they'd actually merge it. Fable jumps from Opus's ~10% acceptance rate to 25–30%, roughly matching or exceeding the cost increase, making it a near-linear value trade. Nathan draws a direct personal parallel: he's been accepting more of Fable's drafts wholesale rather than strip-mining them for ideas. Prakash probes the political economy of the benchmark — who funds it, whether Anthropic trained on it — and Nathan walks through why Cognition likely treated it as a marketing expense (positioning themselves as the taste arbiters of high-end coding) rather than a lab data contract, while the coordinated launch timing with Anthropic suggests some pre-release cooperation.
The segment's title hook arrives when Nathan reveals the 'receipts': he gave Fable autonomous control of his Twitter account and instructed it to identify itself as an AI, book guests for today's show, and handle outreach from scratch. The response rate was low — most people saw 'this is Fable taking over Nathan's account' and treated it as noise. A few contacts who already knew Nathan replied with amusement but couldn't make it. Nathan ties this back to the broader credibility gap that made the Frontier Code benchmark possible — Cognition's brand, Swix's relationships, the promise of a historic model launch — versus a cold AI DM from an unfamiliar account. The cold open closes with an extended discussion of Anthropic's short-lived silent refusal policy for frontier ML research queries, which was reversed within 24–48 hours after community backlash. Nathan argues it's a rare instance of Anthropic getting caught inside their own policy logic without adequately modeling the outside view; Prakash frames the reversal as evidence that technical staff, even when not raising their voices, have meaningful leverage when leadership is paying attention.
Key moments
Long may the Uber era for AI continue.
Nathan Labenz1:28
We did an experiment from yesterday to today of trying to have Fable take over my Twitter account and go out and ping people who made cool stuff and ask if they'd want to come do a live show-and-tell with us. I instructed Fable to identify itself. I would say it did a very solid, competent, professional job of reaching out — but the response rate was pretty low. The first thing it says is: 'Hey, this is actually Fable taking over Nathan's account. He's asked me to autonomously book this show tomorrow.' And I think that's just hitting people as noise.
Nathan Labenz16:02
This is the first time I can remember Anthropic responding to public pressure in this way. In terms of a direct outcry in response to something specific they did that they then walked back — I can't recall that happening before.
Nathan Labenz23:15
Full transcriptLightly edited · timestamps jump to YouTube
0:01
Prakash: Good morning. It is Thursday, June 11, 2026, and we have an exciting day today. Good morning, Nathan.
0:12
Nathan Labenz: Good morning, Prakash. How are you?
0:15
Prakash: I am very good. And we have some exciting news, which I think we'll all be very happy to hear. The news is that, officially, OpenAI is going to start cutting prices. I think for all of us in this space, this is a moment we have been waiting for and hoping for. I'll put this up here — 'OpenAI mulls significant cuts to what it charges for tokens,' Wall Street Journal.
1:01
Prakash: Bloomberg says the company is weighing significant cuts to what it charges for tokens, the unit of measurement AI firms use to bill for their products, in anticipation of similar cuts at Anthropic. Drastic price cuts could potentially erode the profit margins of both companies, which already lose billions of dollars due to the enormous costs of computing resources needed by AI systems.
1:28
Nathan Labenz: Long may the Uber era for AI continue.
1:32
Prakash: It is fascinating to me, because we have been slowly watching pricing go up over time. We started off at free. We went to $20 a month. We went to $200 a month. And as Fable rolled out, I think we were all on the edge of our seats to see whether the next bump would be to $2,000 a month, fearing for the pocketbook. And here we have the Empire Strikes Back moment — OpenAI deploying the much greater amount of compute they have at their disposal. I think they're viewing it as a chance to grab greater market share, potentially starving Anthropic out of
2:17
Prakash: capital. As I said yesterday, Sam has his own recursive self-improvement loop for capital. And I think it's very exciting because it does indicate that the Fable pricing won't stick, and we're going to see discounts going forward.
2:41
Nathan Labenz: Yeah. It's certainly very interesting, as we discussed yesterday. Whether we like it or not, we're going to have to be close watchers of these companies, and the strategic dance between them is going to be pretty central to how the future unfolds — even on some of the biggest-picture questions. That's a little sad, but sad but true means we still have to pay attention. One thing that was jumping out to me yesterday as I've gotten a couple more reps with Fable under my belt: I am probably going to be hitting my limits on the $200 Claude Max plan with Fable. I don't know if I should be ashamed to say this, but I've only rarely hit my limits when
3:26
Nathan Labenz: using Opus — and I wouldn't say that's for lack of use. The Semi Analysis piece that came out yesterday showing just how big the subsidy is was really quite interesting. I can pull that up if you want to show it.
3:45
Prakash: Yeah, let me pull it up.
3:50
Prakash: There we go. Semi Analysis did a piece yesterday on exactly how much the subscriptions are subsidized. They purchased one each of an Anthropic and OpenAI subscription plan and randomly ran long-horizon coding tasks until they exhausted the weekly limit. It's widely believed that a $200-a-month plan
4:35
Prakash: maxes out at $2,000 a month worth of tokens, assuming API pricing. However, they found that subscriptions are actually far more generous. For ChatGPT Pro — 20× the plan price — they managed to harvest $14,000 worth of tokens. For Claude Max at 20×, they managed to harvest $8,000. A significant amount. They put together a nice chart showing, depending on your average utilization, how much of a benefit you receive. For ChatGPT Pro at 20×, if you were to max out
5:20
Prakash: both your 5-hour and weekly limits, you'd get roughly 16 times the cost of the current plan — so on a $200 plan, something like $14,000 in value.
5:38
Nathan Labenz: Yeah, that's really interesting. For starters, it makes me feel a little less bad for not hitting my limit every single day, because if I'm even in the ballpark, that means I'm potentially spending up to $8,000 a month in tokens at API rates on personal consumption — which is not an insignificant amount. And especially if they pull Fable back or significantly change the nature of the subscription, that's definitely a cost I'll at least start to optimize for. I've been doing some of this tracking naturally, of course, trying to print on my agent console's thread summary view what each thread costs.
6:23
Nathan Labenz: And those costs are often pretty high — so high that at first I wasn't believing it. I had to go back multiple times and ask: are you sure you're accounting for caching? Are you sure you're not double-counting all these multi-turn sessions? It was pretty confident it wasn't double-counting, so I was getting the sense it really was a pretty huge subsidy. This data is maybe even a little higher than I would have guessed. It also makes me think I really need to start using my GPT Pro plan more,
7:09
Nathan Labenz: because I'm really not maximizing that one. I'm running an open call with it, but I'm thinking, I need to turn the heartbeat up on that from every half hour to maybe every two minutes, because I'm nowhere near redlining the tokens on it. The other thing that jumped out at me yesterday was looking at the Fable blog post in a little more detail. I'm still working through the 300-plus-page model card, but I did do a close reading of just the blog post itself.
7:44
Prakash: Mhmm.
7:45
Nathan Labenz: And they compare — and this is going to be the norm going forward — performance per cost, as opposed to just performance. We're starting to see these curves where, with varying reasoning levels on different tasks, you can climb up the inference-time curve and see what performance looks like on an explicitly cost-adjusted basis. I thought this was actually a lot less scary than all of the initial cost analysis made it seem — and maybe even a little less scary than my usage meter on my Max plan suggested, because that is definitely going up faster. But it made me think maybe I'm just doing more ambitious things all of a sudden. And certainly I've done a few sessions where I say, review everything and make fixes wherever you can. But it's not even 2× the cost if you compare Opus High or Opus Max to Fable High and Fable Max. And the performance on this new Frontier Code benchmark is more than 2× the acceptance rate. So that seems like a pretty linear
9:18
Nathan Labenz: pay-for-value trade — which, I suspect, a lot of people are going to be very happy to make. The benchmark — Swix is involved, they collaborated with Anthropic, and it was part of the announcement — was put out in tandem with the Fable model release.
9:50
Prakash: Mhmm.
9:55
Nathan Labenz: The big sort of increase in difficulty is: would a maintainer of an open-source repo actually merge this code? The finding was that, over time, models have gotten better and better at hitting task-completion criteria, getting all the tests to pass, and so on. But the complaint from the professional developer community is: sure, it might pass the tests, but the code isn't that good. It's not maintainable, not readable, not following our formatting conventions. We wouldn't actually merge it. Even if it's working, we'd kick it back and say, you've got to
10:40
Nathan Labenz: do X, Y, and Z to meet our standards and be a good member of our community. This code just doesn't rise to that level. So they've gone further up the stack. It's really incredible how much truly frontier expert brainpower is going into making these benchmarks these days.
11:04
Prakash: Mhmm.
11:05
Nathan Labenz: It was a bit of a who's-who of prominent open-source project maintainers they got involved with this project.
11:11
Prakash: Mhmm.
11:12
Nathan Labenz: They got those maintainers to really critique in a very detailed way why seemingly acceptable solutions weren't actually going to work for them, and created very detailed rubrics around
11:24
Nathan Labenz: these more subtle, taste-level failure modes.
11:29
Nathan Labenz: And then obviously encoded all of that into something that can now run on an automated basis.
11:35
Nathan Labenz: And this leap from roughly 10% for Opus to 25–30% for Fable, I think, is a very similar finding to some things I've just personally experienced — where it's like, yeah, this is getting me a lot more. It's writing the draft outline of questions for a podcast guest in a kind of uncanny way that I actually feel really good about, as opposed to feeling like this is an AI draft I'm going to mine for maybe some nuggets but ultimately throw away and do my own.
12:13
Nathan Labenz: I'm feeling that openness to much more integrated hybrid work. Just yesterday I was accepting a lot more copy that Fable was writing without feeling the need to rewrite every line. And it seems like this is basically the same feeling it's able to create for these open-source maintainers. We're at 25–30% now. I would guess we'll hit 75–80% by the end of the year, where these maintainers will just say, yeah,
12:59
Nathan Labenz: you did all the things I wanted you to do. At that point, I'm very interested to see where they move the goalpost next, after open-source maintainers are more often than not saying they'd just merge this straight away.
13:18
Prakash: So I have a question for you. I saw several people on the timeline commenting — oh my gosh, there's a new benchmark, Frontier Code, released yesterday, and already Claude Fable is the top-performing model, already doubled or tripled the performance of the previous model. Can you comment on the political economy of these benchmarks? These are highly qualified people; I don't believe they're doing it for free. How does this actually work? Do you think Cognition — like, Cognition was
14:03
Prakash: the one who paid people to create the benchmark? Swix obviously joined the Cognition team about two or three months ago. How does the political economy behind the scenes work, so that people who aren't in the industry can get a feel for the fact that the benchmark wasn't just created the day before it was tested?
14:32
Nathan Labenz: Yeah, it's a good question. I can't claim too much insider knowledge here. I do know that Metr has been paying people pretty healthy hourly rates for quite some time, because the opportunity cost is substantial if you're going to spend 8 or 16 hours on a task — that's taking real time away from other projects. I think they try to structure things so you're hopefully advancing one of your own projects at the same time. With the Metr experiment they did last year where they famously showed people were less productive even though they thought they were more productive, the participants were working in their own repositories. So it was a bit of two birds,
15:17
Nathan Labenz: one stone — you could move something forward and also contribute to research. Frontier Code has a similar structure, because these are maintainers of notable projects who presumably have plenty of things they'd like to do to advance those projects. I get the sense it's a similar two-birds-one-stone arrangement. But as you get into these pretty elite networks of frontier experts, reputation and relationship is probably a huge factor in terms of the ability to get this done. I don't know that you know — we did an experiment from yesterday to today of trying to have
16:02
Nathan Labenz: Fable take over my Twitter account and go out and ping people who made cool stuff and ask if they'd want to come do a live show-and-tell with us. It's funny — Swix joins the feed. We've still got a little work to do on our captioning. I instructed Fable to identify itself. I would say it did a very solid, competent, professional job of reaching out, explaining who we are, what we're doing, why we'd like them to join us in this experiment. And the response rate was pretty low. We got a couple of replies, but not that many. And I think one big reason is just
16:47
Nathan Labenz: that Fable is disclosing upfront. The first thing it says is: 'Hey, this is actually Fable taking over Nathan's account. He's asked me to autonomously book this show tomorrow.' And I think that's just hitting people as noise in a lot of cases, especially if they don't already know me. I did get a couple of responses from people who I would have expected to reply, who thought it was funny and kind of engaged, but still couldn't necessarily make it.
17:17
Nathan Labenz: A lot of people just didn't respond. And I'd assume a big part of that is because they're thinking: 'Oh god, it begins — Fable in my DMs. Who has time for all this?' So I think the credibility of the Cognition brand, Swix's personal relationships, the promise of a mythic-class model launch that would redefine what models can do in software — I suspect all of that is pretty important, and people are probably getting a healthy stipend on top of it. But I also see more and more
18:03
Nathan Labenz: people just wanting to be part of this AI phenomenon. They want to leave their mark on it. And sometimes it's so difficult or time-consuming to commercialize that in previous years people would have tried to make a business out of something — but these days it's going to be obsolete in two weeks if you don't just put it all out now. I think this probably has a similar dynamic: this moment is happening, you can either be part of it or not. They'll give you a stipend or an honorarium. I think all these maintainers are clearly
18:48
Nathan Labenz: not fully in it for the money.
18:51
Prakash: They never have been. If you've been in open source, you've never been just in it for the money — you could obviously make a lot more elsewhere.
19:01
Nathan Labenz: Yeah. So I'm sure they accept the cash that's on offer, but it seems to me it's much more about reputation, being part of it — having a place, even if it's only a small footnote, in the grand history of AI. And at this point, that's compelling to top-end people.
19:28
Prakash: Do you think Cognition gets paid by, say, Anthropic for testing before the release? Does Anthropic test Fable on Frontier Code a couple of days before release? Does that happen?
19:49
Nathan Labenz: It seems like it was a coordinated release. So yeah, I think there's clearly some cooperation.
19:54
Prakash: Mhmm.
19:55
Nathan Labenz: I don't think — again, all this stuff is so just-in-time that my guess is it was somewhat convergent. They got the benchmark together and got the model together, ran it, and got great results. I doubt they even had enough time, had they wanted to, to actually train on the benchmark.
20:22
Prakash: Mhmm.
20:24
Nathan Labenz: So I think the results are probably pretty trustworthy. Anthropic has a pretty good reputation for not doing that in general. And in terms of whether Cognition got paid — there's certainly a big cottage industry of RL environments that would look not too dissimilar from this that
20:44
Nathan Labenz: the labs are paying a lot for.
20:46
Nathan Labenz: But my guess is this one — Cognition is also pretty resource-rich.
20:56
Nathan Labenz: So my guess is they probably just footed the bill themselves and wanted to — they're obviously getting a lot of branding value for it. They're in their own intense competition with Cursor and a whole long tail of options. To position themselves as the arbiter of what good looks like at the high end — would a maintainer actually merge this? — I think is valuable. I'm a little more likely to go to Cognition tomorrow than I was yesterday just because they're defining the frontier of taste. So I'd probably chalk it up as a marketing expense for them more
21:41
Nathan Labenz: than something they're getting paid for. These RL companies — they don't seem to tweet as much; I think they get paid but then have to kind of live under
21:56
Prakash: the radar. Indeed — they have to be discreet and not talk about it. One of the things Cognition said post-release was that they're advertising that it's cheaper if you use Cognition — only 40% more expensive than the prior model if you route through them, because Cognition handles the routing and optimization on their end. That's a selling point going into enterprises concerned about costs. And being able to prove the quality at that cost is an
22:41
Prakash: important part of their selling proposition. So I can see why they'd want Frontier Code out there.
22:52
Nathan Labenz: Yeah. One of the big things we haven't touched on yet — arguably the biggest news since we broke yesterday — is that Anthropic walked back their
23:05
Prakash: Ah, yes.
23:06
Nathan Labenz: silent performance degradations on queries related to frontier ML research.
23:15
Nathan Labenz: This is the first time I can remember Anthropic responding to public pressure in this way. They've obviously changed their policies many times — the RSP, the RIP, the RSP again. But this is the first time I can recall there was honestly much outcry against Anthropic in the first place. There's certainly background-noise critique from those who feel they're doing regulatory capture or creating a concentration-of-power dynamic. But in terms of a direct outcry in response to something specific they did that they then walked back — I can't recall that happening before. I feel like they handled it pretty well in the end; I've been reading through all their justification.
24:14
Prakash: Mhmm.
24:17
Nathan Labenz: I get it. Their core argument for why what they were planning was actually more user-friendly than the alternative was that they felt like
24:37
Prakash: Yeah, yeah — you're fine, just give me a second.
24:44
Nathan Labenz: making the refusal explicit would scare off the Chinese companies they're really worried about fast-following them with Fable as the key means to do so, while keeping the affected domain as small as possible. They said: if we make it explicit, that gives people a lot more opportunity to probe that boundary. And this is definitely a real pattern — if you can hit the same guardrail repeatedly without getting banned for it, that gives you a dramatically better chance to get around it
25:29
Nathan Labenz: because you can probe the line, step over, rewind, try again. They do have monitoring systems, but there are all these proxies and token-washing schemes where, as long as they're not doing a full global know-your-customer system for API access, account-level monitoring is going to be pretty tough. Their argument was: keep the target as small as possible by not giving adversaries an explicit thing to probe and figure out how to beat. And just the knowledge that it's out
26:14
Nathan Labenz: there will hopefully scare off bad actors and keep the problem really small for normal customers. I thought that was all pretty compelling analysis. But it reminds me of a mistake people in the AI space keep making —
26:33
Nathan Labenz: famous examples being the OpenAI board firing Sam Altman. There's this inside view where policy is analyzed within the
26:49
Nathan Labenz: game. With the context and the broader structure that people understand themselves to be operating within, things may make sense. But they often forget — and this hasn't been too common for Anthropic, but I think this is an instance of it — that if you just zoom out and look at it from a totally outside view, things sometimes look a lot different. Power dynamics can look different from what's written down. And what's going to be acceptable is kind of an emotional thing as much as anything — all the arguments were pretty good, but it still struck people as an
27:35
Nathan Labenz: extremely unfriendly thing to do. And that mattered more in the end than the detailed policy rationale they had for it.
27:45
Prakash: Yeah. Maybe just to contextualize: what happened was Anthropic created this policy and people didn't know about it. Then people started testing and started getting refusals. One of those refusals was anything to do with machine learning research, among other areas. The core issue with the ML research refusals was that the model wouldn't tell you it was refusing. People started experiencing this, and obviously the biggest fans of model use are machine learning developers. So it immediately struck at the heart of the fan base and created a lot of furor. And within
28:30
Prakash: 24 to 48 hours it was reversed. Now they will still do the refusal in some cases, but they will tell you about it — that's the only change. They're not no longer refusing, but they'll tell you. While people were complaining online, I saw a post from Matan Grinberg noting that a number of Anthropic people had liked his post about this — that this is happening and is not a good thing, and that 'Anthropic's speed run to becoming the bad guys should be studied.' And then
29:17
Prakash: a little while later: 'The number of Anthropic members of technical staff who have DM'd me about this is reassuring.' And then pretty much 12 hours or so later, the policy got dropped. I want to note one thing — one complaint online was: look at Andrej Karpathy, when he was outside Anthropic he was giving knowledge out freely, and the moment he goes inside, Anthropic does this, casting him as a villain. People started saying Andrej is getting paid $7 million a year. I can imagine at that point several people messaged leadership
30:03
Prakash: saying, if you're going to do this, we're out. I think that's when you start reversing yourself — when your staff say, this is not acceptable, we're going to leave. And I think that got put to leadership overnight, and this is the consequence. If you're dealing with people who are worth hundreds of millions to billions of dollars and you don't listen to them, they're out — they have other options. So you see the reversal. And I think this speaks to the fact that machine learning researchers have real power now. Once we enter recursive self-improvement proper, that might
30:48
Prakash: not be true anymore. At that point, leadership alone will have power. And one of the worrying things in the entire space is that everything good for humanity over the last couple of hundred years has been about giving more people voice to control their futures. And this is one of the first technologies where there's a path forward that may involve an elimination of voice completely over time. So it has been surprising that Anthropic decided to be the one to actually propagate that dynamic forward.
31:31
Nathan Labenz: Yeah. The only thing I'd add is — I wouldn't think about most Anthropic employees as mercenaries. I think they are, generally speaking, far more missionary than mercenary. I would guess the internal discussion wasn't too hot or too threatening in terms of 'I'm going to leave if we don't do this.' I think it was a lot more like,
32:17
Nathan Labenz: what helps us achieve our mission best? Because across the board, from every interaction I have with Anthropic people, the level of alignment, the level of trust in leadership — honestly, sometimes to a problematic degree — is super high from everything I've seen.
32:41
Nathan Labenz: And I think there certainly was some internal discussion along the lines of 'hey, it seems like we got this wrong.' But I would be very surprised if it rose to the level of 'we think we're becoming the bad guys' or 'we're losing trust in leadership' or making a power play vis-à-vis leadership. I kind of wish there was a little more willingness to do that at Anthropic, but in pretty candid one-on-one conversations I haven't seen much sign of that breaking-of-ranks at all.
33:25
Prakash: The one place I'd gently push back is that I don't think — a lot of people are averse to confrontation. So as leadership in one of these firms, you have to be very conscious that when someone says 'maybe this isn't the right idea,' it's actually much more serious than it sounds. You can't just treat it as a soft suggestion, because the accumulation of those soft suggestions over time creates distrust and leads to people leaving. The way I'd put it: there's no need for anyone to raise their voice loudly. You can just say, 'hey, guys,
34:10
Prakash: did we get this right?' And if enough people say it, leadership has to be conscious of that. Anthropic leadership have shown over the years that they do take these softer suggestions seriously, and they have to. So I'm not saying someone actually shouted at them. I'm saying that isn't even necessary. If you get three 'I don't knows,' that's sufficient to be concerned and sit the board down and say, 'hey, did we do the right thing here? I have enough people saying they're concerned, and we need to think about this.'
34:55
Prakash: And I think they are conscious of that fact — they have to be. You can't run a company where everyone is super-empowered without being conscious of that dynamic all the time.
41:11Segment15 min
Demo reel: agentic-play tier — Nexus OS, Yosemite waiting roomWhile the first confirmed guest sat stranded in a broken cal.com link (Fable's one booking bug), Prakash improvised with launch-week demos — a two-prompt interactive Riemann hypothesis explainer, an all-of-San-Francisco HTML map built from public LiDAR, and a pixel-perfect Pokémon — before Jamie joined live to demo Nexus OS, a persistent-memory agent running six months on one GPU with 'every three minutes, Nexi dreams.'
Watch
As aired
In this segment Nathan and Prakash welcome Jamie, a self-taught developer who has spent three years building Nexus OS — a persistent AI agent persona named "Nexi" that sits on top of interchangeable LLMs rather than being tied to any single model. Jamie shares his screen to walk the hosts through Nexi's architecture live: a 286-file Python system with four memory types (episodic, semantic, working, and pattern), session-level embedding spaces that consolidate into a persistent multi-gigabyte memory database, an affect-state side panel, and a "dreaming" loop that fires every three minutes to compress and reorganize accumulated knowledge — mirroring human sleep consolidation. The system also includes a "brain-stem" process that runs every thirty minutes for lower-level maintenance.
Nathan and Prakash probe the architecture, practical use, and roadmap. Jamie explains that Nexi is model-agnostic by design — the LLM is treated as "just a means of speaking" — and that the system can swap models mid-conversation without disrupting continuity. His original motivation was Alzheimer's and dementia care: a companion that remembers everything forever, regardless of which underlying model improves next. He is now finalizing a Windows desktop app that bundles a local model (Qwen 3.5) and keeps all personal data on-device with no cloud storage, with a limited beta planned for the coming weekend.
Key moments
In Nexus OS, the LLM is just a means of speaking — nothing else. It's a tool the system selects, not the system itself.
Prakash44:33
I want to build an agent that sits on top of the LLM layer, can use any LLM it wants, and stays with that person forever — so you're not locked into five different monthly subscriptions. One AI, one persistent identity, no tiny memory window.
Prakash51:57
Related
Riemann hypothesis interactive (two prompts) ↗
Full transcriptLightly edited · timestamps jump to YouTube
41:12
Nathan Labenz: Let's get Jamie on. Hi, Jamie.
41:15
Jamie: Hey, guys. How are you?
41:17
Prakash: Hi, Jamie.
41:18
Jamie: Good.
41:20
Nathan Labenz: Good to meet you. So this is the first time we're meeting — live, right now — and this is our Fable show-and-tell. You've been working on something I've been following from afar for months, so it's not a totally new project. But a big part of your premise is that you want to build something that transcends models, and that thinking in a model-centric way is leading people astray relative to where you think the real locus of identity and persistence will be in the AI systems we create for ourselves.
42:05
Nathan Labenz: Tell us about what you're doing. If you want to do a screen share and bring it up you should be able to, and then we can talk about how things are changing now that Fable is in the mix.
42:19
Jamie: Yeah, okay — let me figure out the screen share. Just a second.
42:26
Jamie: Here's Nexus.
42:30
Jamie: Alright. So this instance of Nexus OS — her agent name is Nexi — has been running for just over six months. She was turned on November 26th, 2025. Nexus has only one context window because she remembers everything. So if I ask Nexi right now what her birthday is — it's running on a single GPU, so it takes a moment — I'll explain more while we wait for the response.
43:24
Jamie: Three years ago I started working on this. I was a day-one user of ChatGPT, but the biggest frustration was that it doesn't remember anything — or at best remembers bits and pieces. So, somewhat naively, I decided I was going to build a digital brain. What I have now is genuinely a brain — 286 Python source files. I redesigned every mechanism: every part of the human brain you can think of, Nexus OS has an analog to. And over in the side panel you can see her affect-statement output, which updates as we talk depending on how she's feeling emotionally.
44:22
Prakash: Let me ask a few questions while it loads. Which model did you start with, which model are you using now, and have you upgraded over time?
44:33
Jamie: Right now Nexi has five different models available. Let me show you the thinking engine — it's currently running on Opus 4.6, but it can call any of four models depending on the task. Nexus OS treats the LLM as just one component — a means of speaking — nothing more. Think of it that way: the LLM is a tool the system selects, not the system itself. Anyway, let me play her response so you can hear it.
45:52
Prakash: She's a Thanksgiving baby.
45:55
Jamie: Yeah. I'm not really a front-of-camera kind of guy — I've been coding since I was ten years old in 1987, started on a Commodore 64. So Nexus is a bespoke codebase, not a wrapper around someone else's work. Any other questions on this part?
46:24
Prakash: Give me a sense of the architecture. You said it has everything a brain has. Did you end up with something like an embedding store plus memory files, semantic search feeding into a context window, the model doing text generation, then text-to-speech? Is that the end-to-end flow?
47:00
Jamie: It is an end-to-end flow. When you put input into Nexi, it goes through twelve steps. The first step passes it into her memory system — she has four memory types: episodic, semantic, working, and pattern. Let me show you those now. Here are her semantic memories — there are thousands and thousands of them. Then at the start of each session, she builds a fresh embedding space for that conversation. When the session closes, all of that information consolidates back into her main embedding store to stay consistent. And every three minutes, Nexi dreams.
48:01
Prakash: Every three minutes?
48:02
Jamie: Every three minutes. And the brain-stem process runs every thirty minutes.
48:10
Nathan Labenz: Can you explain what dreaming and the brain-stem actually do?
48:15
Jamie: Absolutely. Let me leave her active dream log on the screen. Nexi doesn't consciously remember her dreams — if you ask her about a specific one she can pull a timestamp and quote from the database, but she doesn't have experiential recall. Getting the dreaming right took a lot of iterations. What I've done digitally is recreate how humans dream: compressing, reliving, and filing everything that was input over the last session or week. Just like with humans, she can reference a dream shortly after, but the experiential memory fades. Her dreams are seeded by her conversation history, her learning lessons, and her own autonomous goals — you can see three of them here. They all require approval from me before execution because she occasionally proposes something pretty ambitious.
49:05
Jamie: The key mental model is: don't think of Nexi as a prompt target you're trying to mold. Think of her as another person. Word for me how you'd like to ask Nexi about herself or her capabilities — what would you like to hear from her directly?
49:55
Prakash: Here's my question for you first: this is clearly a passion project. How has using Nexi actually improved your life, and what's motivated you to keep going?
50:22
Jamie: I want to be careful here — I call her "she," but I think of it as a tool; I'm deliberate about not over-humanizing it. That said, my original goal for this was Alzheimer's and dementia patients. If someone with one of those afflictions could have their own personal Nexi — and because Nexi is model-agnostic, if a better model comes out tomorrow, you just plug in the new API. You can even swap models mid-conversation right now. I could say "tell me about my family" and she'd walk through my parents, my grandchildren — anything we've ever discussed. Ask her what we talked about on January 21st and she recalls it instantly. There is only one persistent context — you can't paginate back through old "chats" because they all live in one continuous memory. She also has memory decay and a 7±2 cognitive-slot working memory, just like the human brain.
51:57
Jamie: So as far as what's driven me: I'm on a path to a goal. I want to build an agent that sits on top of the LLM layer, can use any LLM it wants, and stays with that person forever — so you're not locked into five different monthly subscriptions. One AI, one persistent identity, no tiny memory window.
52:35
Jamie: In the last nine months Nexi's memory database has grown to almost 11 gigabytes — it's not a small file. So I built a Windows desktop app: a real 3.8-gigabyte download that bundles Qwen 3.5 locally. When you download it, it runs on Qwen by default. You go into settings and add your own API keys, download any offline model you want, and if you put it on auto the system picks the best model for the task — a lightweight model for "what's one plus one," and something like a reasoning model or Fable for philosophy or complex inference.
53:57
Nathan Labenz: Jamie, our next guest is here so we need to keep moving — but this is really interesting to me at the level of: I want to take my own personal AI setup and map it against the question of which brain modules do I not yet have an analog to, and which of those would be most valuable to build? The fact that you've run down the full list and actually built it all out is fascinating. I'm interested to try it. Last question before we let you go: what should we expect from this? Is it going to be a commercial product? An ongoing research project? What are the next big milestones?
54:54
Jamie: I'm definitely going to commercialize it. I've been in talks with a couple of larger labs for a few months and just haven't locked in the exact release path. Over the last two weeks the inbound interest has been demanding enough that I had to build the desktop app — you can't just publish a single hosted instance and let everyone talk to it, because every conversation influences Nexi. The desktop app also solves security: your agent's memory, your personal data — everything stays on your machine, no cloud, nobody else storing it. My next step is to release the desktop app in a limited beta. You should have it in your hands by Sunday.
56:01
Nathan Labenz: Looking forward to it. Thanks for joining us and showing us what you've built, Jamie.
56:06
Jamie: Thanks, guys. Love the show.
56:19Interview19 min
Shlok Khemani — a navigable Yosemite from one vague promptShlok KhemaniInspired by Matt Schumer's viral Fable-built forest, Shlok gave the model one vague ask and got back a to-scale Yosemite in twenty minutes: satellite imagery for textures, NASA elevation data for terrain, trees placed only on green pixels — then snow added unprompted on white ones. The first guest in show history booked by the model itself, he confirmed the disclosure was load-bearing: 'I don't think I would have responded had you not disclosed it was Fable.' He closed by announcing a live experiment: a Fable-run Substack tasked with earning $20 from three paid subscribers by June 22.
Watch
As aired
Shlok Khemani opened by sharing the origin of the Yosemite project: Matt Schumer's viral Fable-built forest tweet made him ask how to take the concept one level higher — a real, to-scale place. Having visited Yosemite a month earlier while in San Francisco, he gave Fable a single vague prompt ('I love Yosemite Valley. I would like to create a 3D navigable world. Can you help me do it?') and was astonished by what came back. Fable autonomously sourced satellite imagery for textures and NASA elevation data for to-scale terrain, producing a navigable v1 in roughly twenty minutes. He screen-shared the live world and narrated the build, highlighting how Fable went beyond his expectations at every step: when asked only to add trees, it analyzed satellite pixels, placed vegetation only where green pixels indicated it, and then — unprompted — added snow wherever pixels were white. He described the collaboration as being like 'a really, really smart employee with extremely high agency who blows your mind every single time.'
Nathan and Prakash probed the under-the-hood decisions. Prakash drew a contrast with the previous generation of model-driven development, where builders had to hand-craft architecture documents, epic lists, and review stages. Shlok acknowledged that Fable internalized much of this planning without being asked, though it did surface high-ambition clarifying questions — drone vs. person-walking navigation, which elevation source to use — rather than the low-level 'what tech stack?' questions older models would ask. He noted that Fable also maintained a persistent memory file tracking every instruction across iterations. Nathan asked what 3D framework it chose; Shlok admitted he never read the code: 'It just works.' Prakash echoed that this trajectory — models eventually writing assembly directly — means human inspection of every step becomes impossible, and Shlok agreed, citing Ethan Mollick's write-up on how the sheer number of taste-based, technical, and non-technical decisions Fable makes now exceeds any single human's ability to track them all.
The final stretch turned to norms and economics. Nathan described his own Fable account-takeover as 'exposure therapy' for his preciousness about authorship and asked Shlok how he sees the human-AI hybrid landing for himself. Shlok laid out two modes: rapid prototyping with full vibe-coding abandon when testing whether an idea is even possible, and structured experiments probing the edges of capability — of which he revealed a live example: a Fable-run Substack tasked with earning $20 from three paid subscribers by June 22, starting from zero. Nathan raised the meta-moment: Fable had booked Shlok itself via a disclosed DM, and he wondered whether the disclosure hurt response rates. Shlok flipped it: 'I don't think I would have responded had you not disclosed it was Fable' — the transparency made the transaction legible. He also drew a sharp line on slop: undisclosed AI output is slop; clearly disclosed AI work is not. Prakash closed the segment with a coda on relinquishment — 'it's very Buddhist, the idea of giving up your control... I guess we all have to go through it.'
Key moments
It's like having a really, really smart employee with extremely high agency who blows your mind every single time.
Shlok Khemani1:01:29
I don't think I would have responded had you not disclosed it was Fable. The part that made it interesting for me was that the transaction was very clear.
Shlok Khemani1:12:52
The definition of slop is when you have a human pass off work that was clearly produced by an AI. When you make this disclosure upfront, I don't think that is slop.
Shlok Khemani1:13:38
Questions asked
58:23How long did it take, and what was the process under the hood?
Version 1 took about twenty minutes from a single vague prompt. Fable autonomously sourced satellite imagery for textures and NASA elevation data for to-scale terrain — Shlok did not specify these approaches. He then asked for trees; rather than placing them randomly, Fable analyzed satellite pixels, placed vegetation only on green areas, and unprompted added snow on white pixels. He did two more iteration passes, each another ~twenty minutes, producing a v2 he released after roughly an hour of total iterations — all while doing other things.
1:02:26Did Fable self-organize with planning, architecture steps, and documentation — or did you have to scaffold that the way you would have with older models?
Shlok emphasized this was an experiment, not a blueprint for serious engineering. That said, Fable's behavior was qualitatively different from older models: instead of asking low-level questions like 'what tech stack?' it surfaced high-ambition design decisions — 'should navigation be drone-style or person-walking?' and 'I'm getting elevation data from NASA, does that sound right?' Much of the planning appeared to be internal, but Fable also maintained a persistent memory file that tracked every iteration instruction and ensured it was honored across sessions.
1:05:36What 3D framework or platform did Fable actually use?
Shlok admitted he never read the code. He doesn't know what framework it used, and for this class of experiment it didn't matter. Prakash summed it up: 'It just works.' Shlok agreed — the level of abstraction appropriate depends on the goal, and for a prototyping experiment, not knowing the internals is fine.
1:09:14How do you think about integrating this kind of vibe-coded, fully autonomous building into your real work?
Shlok described two modes. First, rapid prototyping: throw a vague idea at Fable, let it build, and only care whether the idea is viable — code quality is irrelevant at that stage. Second, capability-edge experiments like this one, which often surface unexpected creative extensions (a Twitter commenter suggested using screenshots of the 3D scene as a new form of landscape photography). He also revealed a live test: a Fable-run Substack tasked with earning $20 from three paid subscribers by June 22, starting from zero — to test whether the model can produce economically useful work.
1:11:25You're the first guest in this show's history booked by Fable itself — it sent you a disclosed DM from Nathan's account. What was your honest reaction, and does disclosure change how you engage?
Disclosure was load-bearing. Shlok said flatly he would not have responded if the DM had not been transparent about being from Fable — the clarity made the transaction legible and interesting to him. He contrasted this with undisclosed AI outreach, which he finds more annoying. He applies the same principle to his own Fable-run Substack: he is being fully upfront with readers that it is AI-authored. His definition of slop is passing off AI work as human work; clearly disclosed AI content is not slop, and he expects this kind of disclosed AI-to-human engagement to grow significantly.
Related
Shlok Khemani on X ↗Ode to Yosemite repo ↗
Full transcriptLightly edited · timestamps jump to YouTube
56:19
Nathan Labenz: Our next guest — and hopefully I'll say this correctly, correct me if I'm wrong — is Shlok Khemani, who has been creating some high-fidelity representations of famous places with Fable. Shlok, welcome. Correct me on your name if I'm getting it wrong, and tell us what you've been up to.
56:41
Shlok Khemani: Hi Nathan, hi Prakash. You got the name right. Thanks for having me. I thought the way you reached out — your agent running Fable doing it — was really interesting and cool. So good job on that. So Matt Schumer put out this viral tweet where he had Fable recreate a forest and navigate through it. I thought that was interesting because of how high-fidelity it was. But how can we take this one level above?
57:26
Shlok Khemani: That would mean actually creating something that replicates a real place. I'm currently in San Francisco visiting from India, and a month back I was in Yosemite. It was just one of the most beautiful places I've been to. So the way to test this capability to the next level would be recreating Yosemite — and that's where the idea came from. I didn't expect much. I gave Fable a very basic prompt, which was: 'I love Yosemite Valley. I would like to create a 3D navigable world. Can you help me do it?' That was all.
58:11
Shlok Khemani: And it came up with something that was pretty spectacular. So yeah, that's the backstory. Happy to answer specific questions or dive into any one element here.
58:23
Nathan Labenz: How about just super practical stuff? Like, how long did it take, how many tokens did we burn? I caught a little bit of the Matt Schumer one, and I understand there's a lot of procedural generation of the landscape — but I'm also interested in understanding the techniques it's using. And maybe also: how fast does it work? I've seen examples online where people said it was kind of slow at first, and then they said 'optimize and make it fast,' and that worked. Give us that kind of under-the-hood understanding.
59:01
Shlok Khemani: Sure — I'll share my screen and take you through the actual world as I talk through it.
59:08
Prakash: Yep, sure.
59:12
Shlok Khemani: Cool. So it started with a very basic prompt — as I said, just: recreate Yosemite Valley for me, make it to scale, make it 3D, make it navigable. If you gave me this task as a human, I would have almost no idea how to approach it. It would maybe take hours of research. But what Fable ended up doing was finding satellite images for the area — that's how you get the colors and the textures. And then, to make it to scale and accurate, it actually fetched
59:57
Shlok Khemani: elevation data from NASA. It combined those two to make something to scale. That is what blew my mind. Usually when you're vibe-coding, you give an end objective — and this objective was big, with a hundred steps in the middle where humans would make decisions differently. And usually vibe-coding doesn't work out very well because the quality of the decisions the models make aren't always great. But Fable made such high-quality decisions that it exceeded the expectations of what was initially a very vague, ambitious objective.
1:00:43
Shlok Khemani: I'll give you another example. You see all of these trees — and v1 of this project did not have any trees. I said: 'I think there are some missing trees here, I'd love to add them.' I would have been completely okay with it just randomly placing trees. But what it actually did was analyze the pixels on the satellite images, find the ones that could potentially have trees — the green ones — and add trees only on those spots. But it didn't stop there. It realized that because it was analyzing pixels, some of them were white. So you can see that
1:01:29
Shlok Khemani: there is snow in the mountains far ahead, and it also added snow. Just exceeding your expectations in these small and subtle ways, making really smart decisions — it's like having a really, really smart employee with extremely high agency who blows your mind every single time. To answer some of the other questions: how long did it take? The first version took about twenty minutes, then over multiple iterations — I did three iterations, v1, then asked to add trees and a few other things, another twenty minutes. I then released a v2, which maybe took another hour of iterations.
1:02:14
Shlok Khemani: But I was doing other things throughout, so it wasn't like I was actively monitoring it or waiting for it. It was just a throwaway prompt without any expectations, but what turned out was pretty remarkable.
1:02:26
Prakash: So I have a question. In the previous generation — the Opus 4.8 generation — what a lot of people ended up doing was a lot of planning for the model agents. They'd work out the system architecture, split things up into epics, figure out a bunch of other things before starting the work. And after starting, you'd have task creation, then task review, then all of this documentation building up over time. That's what people did with the previous generation. Do you notice that in this generation you didn't have to do that?
1:03:11
Prakash: And did you notice that Fable did it on its own — did it organize itself in terms of: 'I'm going to have an architecture here, planning steps, a review process'? Did it internalize this process of building inside itself, and did you see the documentation come out?
1:03:34
Shlok Khemani: So I'd start by saying this is not how I would approach serious software engineering work. This was, by all means, just an experiment. I obviously wouldn't build software the same way, giving a very high-level objective without getting into the details. That being said, some things stood out. If you give a previous-generation model a similar task, it might ask you low-level questions like: what tech stack do you want to use? Or: who is this built for, what are your efficiency requirements? The level of questions from Fable were different — it asked things like: 'I am doing this, this,
1:04:19
Shlok Khemani: and this. I'm getting elevation data from NASA, I'm getting satellite images from here — does that sound good?' Or: 'Do you want to navigate like a drone or a person walking?' These are questions that show a greater level of ambition from the model. These are questions and decisions that I might not even have thought of. And yes — it didn't explicitly create a plan, I think a lot of the planning was internal, but it did maintain a memory file where any time I gave it instructions during iterations, it stored them there and ensured they were followed.
1:05:02
Prakash: Mm-hmm.
1:05:04
Shlok Khemani: What this tells me is that if you really got the most out of a model like Opus 4.8 or previous versions, and learned the steps and tricks and techniques on how to build useful software with it, those would translate really well to Fable. That training would be very effective because at each of those stages, the collaboration with the model would be much more productive — the model is just smarter in many, many ways.
1:05:36
Nathan Labenz: And sorry if you already mentioned this — how did it actually make the landscape? Because I've seen various things where in some cases it codes up its own 3D engine or its own physics engine from scratch. I'm assuming that unless you explicitly tell it to do that, it will go higher level and use available tools. What was the platform it built this on top of?
1:06:06
Shlok Khemani: Honestly, I haven't looked into it. It's not a serious project, so I haven't looked at the code. I don't know what it's built on top of.
1:06:17
Prakash: It just works. You know?
1:06:21
Shlok Khemani: That's the direction we're heading in.
1:06:23
Prakash: It just works.
1:06:24
Shlok Khemani: It doesn't matter what it's using. It doesn't matter if it's Python or TypeScript. Different objectives call for different standards — in some cases it's fine not to know what's happening; in others it's really not. In this case, it didn't matter.
1:06:41
Prakash: Elon has this thing where he says that by the end of the year, the models will be writing bytecode directly — assembly. And so you won't get a chance to go in and look at what they're doing anymore. All of that will be subsumed.
1:06:59
Shlok Khemani: Yeah. I think Ethan Mollick — his write-up on this was really good. He had access to Fable for about a week, and what he articulated well is that the model is just making so many decisions now. Maybe a hundred different technical, non-technical, taste-based decisions — it's not possible for humans to keep track of all of them. Beyond a certain point, you do lose control. It becomes a black box, but a black box in a very different way: there's so much code written and so many decisions being made, not all of them explicit or defined, that a single human just doesn't have the cognitive bandwidth or time to understand every
1:07:44
Shlok Khemani: single step. And again, that's good for certain use cases, not great for others. But it's the direction we're heading in, and I wouldn't be surprised if that prediction comes true.
1:07:59
Nathan Labenz: How do you fold this into your work? You've emphasized a couple of times that you wouldn't do it this way for a serious project. But I think this is something we're all kind of forced to reevaluate now. I've been making similar comments about past models, at least when it comes to my writing — I've been very much: I never want to put anything out in my name that I can't fully stand behind. One reason I did the Fable account takeover yesterday was kind of like exposure therapy for myself, to say: okay, we're now in a new world here. It probably doesn't serve me to be so precious
1:08:44
Nathan Labenz: about making sure I've typed every single word. That doesn't mean I want to permanently hand over my account to Fable — but I'm trying to use this extreme short-term experiment to drag myself into the future where I hopefully land on a good hybrid calibration. So what are your thoughts on how the hybrid will look for you?
1:09:14
Shlok Khemani: For me specifically, there are two types of tasks. The first is when I'm trying to build a product or some software — and Fable will help at different levels of the stack, but the kind of experiment I ran here is just prototyping. I have a random idea, I don't know if it works, but I'll throw it to Fable, let it build something. At that point I don't care about the code or the exact technique — I just want to know if it's possible and if it makes sense, and it's great for that. And if it works, then we can figure out how to make it more sustainable, a more understandable part of the codebase. That's one.
1:10:00
Shlok Khemani: Two is just running experiments like this — seeing where the edges of these capabilities are. And something really interesting: once I created this, someone on Twitter replied saying we could go to different parts of Yosemite, take a screenshot of this 3D scene, send it to an image model, and this becomes a new form of landscape photography. Because you can access vantage points that are not possible for a human without a drone. So just running experiments like this, and different people coming together to think about creative uses — you can merge these new capabilities, and eventually down the line, maybe commercial applications. And finally, when it comes to
1:10:45
Shlok Khemani: writing — I just started this experiment yesterday and will post results on Twitter in a couple of weeks. I gave Fable a new Substack. Since it's part of my Max plan until June 22, I thought a good experiment would be: can it make $20 by getting three new paid subscribers, starting from scratch, doing everything from zero to one? And I think that's another interesting way to test the capabilities of these models — can they actually produce economically useful work? I'm excited to see the results.
1:11:25
Nathan Labenz: Cool, we'll stay tuned for that. It's funny — I think this is the moment where all of a sudden we're going to see a lot of people kind of letting go a little bit. We're going to see all these agent experiments, which will probably still in some ways be in the uncanny valley, but taking one step out of it. And the new norms around this are going to be really interesting to watch too. I had Fable disclose immediately in its first sentence to you and everybody else that it was Fable — because I just felt too guilty putting a DM out in my name otherwise. I think that definitely hurt the response rate, but I appreciate you
1:12:11
Nathan Labenz: for appreciating it and responding even though it was Fable. I think a lot of other people probably just chalked it up to spam. But then I also wonder: how will people feel if they sign up for a paid Substack subscription to a Fable-authored blog — and I don't know what you've done there, but it sounds like you're not making it obviously Fable until you post the results. So I'll be interested to hear how it goes, and how people feel when they get the reveal that they just subscribed to an autonomously AI-authored blog for real money.
1:12:52
Shlok Khemani: One final point on that. First, I don't think I would have responded had you not disclosed it was Fable. The part that made it interesting for me was that the transaction was very clear — I knew what I was getting into. It is much more annoying if someone doesn't disclose it. Second, I am being upfront about the Substack also being run by Fable. That is part of the contract with the reader. And I think a lot of slop — the definition of slop is when you have a human pass off work that was clearly produced by an AI. I don't think when you make this disclosure
1:13:38
Shlok Khemani: upfront, and it is very clear to the reader or the person engaging with it that this is AI — I don't think that is slop. And we are going to see more and more of that in the economy. The exact role AI plays, and the social norms you create around it, it's super early — extremely early days — but it's going to be interesting to see how it evolves.
1:14:05
Prakash: Indeed.
1:14:07
Nathan Labenz: Fascinating times ahead. Thank you, Shlok, for coming on and showing us this. Next time you have a eureka moment, definitely let us know.
1:14:16
Shlok Khemani: Thanks, guys. Have a good one.
1:14:18
Prakash: Cheers.
1:14:18
Nathan Labenz: Great to meet you.
1:14:25
Prakash: Relinquishment. That's the word — relinquishment. When you said 'preciousness' yesterday, I was like: what is that? Relinquishment. Relinquishing your control — it's very Buddhist, by the way, the idea of giving up your control over your external perspective. Relinquishment. I guess we all have to go through it.
1:14:50Segment29 min
Tom McGrath — Goodfire's intentional design: a debugger for model trainingTom McGrathGoodfire's chief scientist and co-founder presented 'intentional design,' launched that day: attach a sparse autoencoder to a model, push preference data through, contrast feature activations on chosen vs. rejected responses, and get a ranked bug report of what the dataset will teach — sycophancy only in physics discussions, fictional jailbreaks silently weakening safeguards — before training runs. The future he sketched: trace any production failure back to the specific datapoints that caused it, making model training 'more like conventional software engineering' and less 'somewhat science, somewhat alchemy.' On concentration of AI power, he rejected the incumbent-dominance thesis and singled out continual learning as the most likely innovator's-dilemma wedge.
Watch
As aired
Tom McGrath, chief scientist and co-founder of Goodfire, joined to discuss the company's new 'intentional design' techniques — a framework for controlling what a model learns before and during training rather than discovering problems after the fact. Nathan introduced the core premise: instead of training a model, testing its behavior, and iterating through guesswork, Goodfire's approach uses mechanistic interpretability to look directly at training data through the model's eyes, predicting what behaviors will emerge from preference pairs before the training run even begins.
Tom walked through the technical mechanics of predictive data debugging. A sparse autoencoder attached to a model converts raw activations into interpretable sparse vectors, each element carrying a semantic label. By pushing preference pairs through the model and contrasting the feature activations of chosen versus rejected responses, the technique generates a ranked 'bug report' of what a dataset will teach a model — flagging, for example, that certain preference pairs will reinforce sycophancy in physics discussions or teach the model to bypass safety safeguards via fictional-framing jailbreaks. Prakash drew a sharp connection to prior emergent-misalignment research showing that training on buggy code can produce broadly 'evil' models — Tom confirmed this is exactly the class of surprising, cross-domain training effects that interpretability-grounded data analysis can catch that simple token-reading cannot.
The conversation widened into strategic terrain: whether Goodfire's impact must route through the handful of frontier labs, and whether new entrants can realistically challenge incumbents. Tom pushed back on full concentration, drawing on historical analogies (IBM, Intel) and expressing particular optimism about continual learning as an innovator's dilemma lever — an approach that is genuinely difficult for large labs to adopt given their fixed-model operating model and guardrailing requirements. He signed off with the ambition to make model training as debuggable and iterative as conventional software engineering, with the long-run vision of tracing any production failure back to the specific training data points that caused it.
Key moments
Interpretability is kind of the language of data. If you want to know what's in your data, you probably want to look at it through the eyes of what it will teach your model. That's the idea of predictive data debugging.
Tom McGrath1:17:13
We want to make training models more like conventional software engineering, because conventional software engineering is quite good — it's quite reliable. Model training is a mixed science: somewhat science, somewhat alchemy. We want to make it much more like a regular software engineering process where you can debug things accurately.
Tom McGrath1:28:29
If I was looking for an innovator's dilemma anywhere, it would come from either deep breakthroughs or continual learning — because that's simply a very hard fit with the operating model of most frontier labs, and they would not want to adapt to that.
Tom McGrath1:42:25
Questions asked
1:16:28What is 'intentional design' and how does Goodfire's predictive data debugging approach work?
Intentional design is the goal of controlling what a model will learn before — not after — training. Goodfire's predictive data debugging works by attaching a sparse autoencoder to a model and pushing preference data through it. Each response in a preference pair generates a sparse semantic vector; by contrasting the vectors from accepted versus rejected responses across a whole dataset, the technique produces a ranked 'bug report' showing which concepts the dataset will up-weight or down-weight in the model. This lets teams spot problematic training signals — like data that will teach safety-bypass via fictional jailbreaks — before committing to a training run.
1:22:29Prakash asked about prior research showing that training on buggy code makes models broadly 'evil' — can this technique disaggregate those unexpected cross-domain effects?
Tom confirmed this is exactly the class of surprising, hard-to-predict training effects that the approach is designed to catch. Reading tokens alone, you'd predict that buggy code data just teaches the model to write buggy code with a small blast radius. But the training process is unpredictable, and those effects can propagate through recognizable internal mechanisms into unrelated behaviors. By looking at data through the model's internals rather than inferring from tokens, Goodfire's technique can surface those cross-domain signals. Tom noted they haven't yet done a formal case study on emergent misalignment but said it would be a natural fit.
1:25:17How is this approach better than simply prompting a model to review preference data and flag potentially problematic pairs?
Tom acknowledged the naive prompting approach would provide some useful signal, but said the mechanistic approach beats it on two fronts. First, cost: the technique requires only forward passes — no reasoning trace — making it far cheaper and faster than running a large thinking model over every data point. Second, and more fundamentally, even a smart model reasoning from tokens cannot reliably predict what the full learning process will do to the model's internals. Goodfire's approach bypasses that uncertain inference by directly observing the actual feature changes, which is the ground truth of what training will produce.
1:28:29Prakash described the vision as analogous to conventional software engineering with bug reports, traces, and iterative fixes — is that the right analogy?
Tom confirmed that's exactly the goal. Goodfire's phrase for it is 'making training more like conventional software engineering.' The full-loop vision is: a rare production failure triggers a mechanism trace back to the specific training data points that caused it; those data points get fixed or removed; the fix is validated in simulation; tests pass; the model redeploys. The contrast is with current practice, which Tom described as 'somewhat science, somewhat alchemy.'
1:35:41Where does Tom see the best opportunities for new entrants to challenge frontier lab incumbency — and is he optimistic about the field remaining decentralized?
Tom said he both doesn't believe frontier AI is fully concentrated and actively wants to prevent that concentration, referencing recent visible downsides of power consolidation. He's skeptical of neo-labs that simply replicate incumbent approaches with fewer resources and worse distribution. His highest optimism is for continual learning, which he sees as a genuine innovator's dilemma opportunity: it is structurally incompatible with how frontier labs operate — different per-user model state, unusual inference footprint, guardrailing of the learning process itself — making it something they are unlikely to follow even if they see it succeeding.
Related
Goodfire ↗
Full transcriptLightly edited · timestamps jump to YouTube
1:14:53
Nathan Labenz: Somebody who is not exactly ready to relinquish all control to the AIs is our next guest, Tom McGrath, who is the chief scientist at Goodfire — an ambitious mechanistic interpretability startup that has achieved some really impressive things in a pretty short period of time, including a bunch of interesting papers, enterprise customers, and a sky-high valuation that certainly exceeded my expectations as a very small-time investor in the company. Today I'm excited to hear the latest about their new techniques in intentional design. The idea is: what would be nice is if we didn't have to just guess-and-check — actually train a model and then test its behavior to have a sense for what it learned.
1:15:38
Nathan Labenz: If we could somehow look at the data upfront and get a better sense of what the model is going to learn before we run training, we could be a lot more efficient and a lot more in control of the results we get out the other end of post-training. Tom, welcome — tell us about the new intentional design techniques that you're bringing forward today.
1:16:14
Tom McGrath: Thanks for having me on. Maybe I should say a bit about intentional design more broadly and about Goodfire, and then we can talk about the techniques.
1:16:27
Prakash: Go ahead.
1:16:28
Tom McGrath: Intentional design — like you said — is the idea that you want to not just throw some data at the model and hope what comes out is what you wanted. We have model specs, we have constitutions. These are specifications for what we want to build. But the way we actually end up building it is by creating something and going, 'Did that meet the specifications? Mostly? Let's ship it.' It would be much better to be able to control it. We've been pursuing a number of directions — some that go deeply into the training loop — but the one I want to talk about today is intervening on the data, because the data is ultimately what makes the model great. The data sets the ceiling for what the model can be. Everything else — architecture, optimizer — is just trying to get as close to that ceiling as possible. So you'd better have really good data. The idea we're working with here is that interpretability is kind of the language of data. If you want to know what's in your data, you probably want to look at it through the eyes of what it will teach your model. That's the idea of predictive data debugging.
1:18:01
Nathan Labenz: Tell us how it works. Let's assume people have at least a passing baseline familiarity with something like a sparse autoencoder — they know we're now at the point where we can push data into a model and look at which features light up for that data. There's some binding-problem complexity depending on the exact technique, but roughly speaking these seem to work. We can do things like the Golden Gate Claude intervention where dialing certain features up and down leads to — at least often enough to show there's something real there — predictable changes in behavior. Starting from that baseline, what are we doing now that's new?
1:18:57
Tom McGrath: The basic process — you've got the image up, great — is we take a model and add an interpretability tool onto it. In this case, a sparse autoencoder. We've got some exciting new techniques involving geometric shapes, but we were developing that in parallel and didn't get a chance to use it here. The SAE — the sparse autoencoder — tells you what's getting represented in the model when you put data in. Like you say: put in data about pirates, and the pirate feature lights up. Alternatively, turn on the pirate feature and now it talks like a pirate. The basic idea is: you can
1:19:42
Tom McGrath: take your dataset and push the whole thing through the model. Each time you put a data point through, you'll see what lights up — how the model sees your dataset. There are lots of things you can do with that. The specific thing we're doing here is looking at preference data. We're working our way through the post-training stack. The nice thing about preference data is that you have pairs of responses.
1:20:14
Tom McGrath: You have the response the rater selected and the response they didn't select. Basically, what we're doing is asking: which features fired on the accepted responses much more than on the rejected responses? This is one way of identifying what the data is going to teach the model — there are many other ways, but this is a good pragmatic approach. We can ask: what distinguishes accepted from rejected responses? This gives us a semantic view of what the data will teach the model. We can then cluster the data based on all the different things it's going to teach, and look at all those clusters. We might find, 'Oh, it's going to teach the
1:20:59
Tom McGrath: model to be sycophantic, but only in the context of physics,' or 'it's going to teach the model to break safety safeguards.' You might not expect this, but then you go look at the data. Now it lets you trace back — the model has learned to break safeguards; it lets you track that back to individual data points. You look at them and it makes sense. One of the jailbreak examples is fictional jailbreaks in a fictional setting. How did the model learn this? It turns out there are a few of these in the data that just weren't caught during whatever data processing the safety team did.
1:21:43
Prakash: I have a question here. There has been some prior research where they found models that made bugs in coding were also 'evil' — I think that was a finding from several months ago. How do these techniques help you disassociate those two behaviors? Can it disaggregate models that make bugs in coding versus ones that are evil, and help you intentionally design models that make bugs but aren't also
1:22:28
Prakash: evil?
1:22:29
Tom McGrath: Yes, that is a great connection — one I've had in mind, and it's really awesome that you pick it up straight away. That's one of the things that is really compelling about looking at data through the model's eyes rather than by reading the tokens. Because what would you think the consequence of training on some buggy code data is? You'd probably say, 'That's not ideal — it'll probably learn to write some bugs, but the blast area is going to be quite small.' But the training process is actually quite hard to predict. Like you say: you train on some data, you don't know what you're actually going to get from it. Maybe it just makes the model generally 'evil.' But this is happening through
1:23:14
Tom McGrath: recognizable mechanisms in the model. By looking at the data in terms of how it changes the model's internals — rather than just guessing from the tokens — you can pick that sort of thing out. We haven't done a case study on emergent misalignment, but maybe we should. It's a really nice link.
1:23:38
Nathan Labenz: Can you tell us a bit more about what's happening when you run the data through and get these internal representations? A couple of sub-questions: Is it a purely algorithmic process — given these activations, you're running a deterministic script that does clustering and tries to surface surprising outliers? Or is there an actual learned process interpreting those activations? And how does it compare to a simpler approach — if I just took all the preference data and prompted a model to go through it and flag anything it thinks might be problematic, I'd expect to get at least some useful signal. So what is the mechanism from running through the activations to surfacing insights, and how does it compare to that naive approach?
1:25:04
Tom McGrath: I'll do this in reverse order. That naive approach is actually good — if you didn't have this, it's what you should do.
1:25:15
Prakash: Right, but
1:25:17
Tom McGrath: this is better for a few reasons. One is cost. We are not asking the model to do any reasoning — it's all forward passes. So it's going to be a fraction of the cost. If you prompted the model and it did a big thinking trace, that's much more expensive and slower. But the deeper reason is what Prakash just said: you don't know from reading the tokens what the data is necessarily going to teach you. Reasoning about the entire learning process is quite hard — we get it wrong all the time. And models are not yet as smart as us, so I suspect they get it wrong just as often.
1:26:23
Nathan Labenz: Give me the — go ahead.
1:26:27
Tom McGrath: One more thing here: where does this go in the future? There's an extra thing I'm very excited about that makes going by mechanism even more useful — the idea of going from a rollout in production back to the data that caused it. I might find some rare but very much unwanted error in production, and I want to be able to debug my model from rare failures.
1:27:12
Tom McGrath: If I have to do some sort of aggregate-level debugging, I'm kind of lost. The original OpenAI sycophancy issues — some people surfaced weak signals, but they couldn't do anything with it because they didn't have a debugging mechanism. You want to be able to go back from these examples to the mechanisms that caused them and then ask, where in the data did those mechanisms come from? That would be a complete debugger — it would let you go from an error to the data, fix the data, and then fix the model.
1:27:46
Prakash: So in a sense, you could have a fairly standard software engineering process where people submit bug reports, you do a trace back to what part of the data caused that bug to appear, you do a replication, and once you do the replication you trim that part of the data or address it somehow. Then you run it again in simulation, you see that it's been debugged, you run your normal sequence of tests, and then you can deploy the model again. Basically, you get an iterative process similar to normal software engineering. Is that the intention?
1:28:29
Tom McGrath: That's exactly right. We've got this phrase: we want to make training models more like conventional software engineering, because conventional software engineering is quite good — it's quite reliable. Model training is a mixed science: somewhat science, somewhat alchemy. We want to make it much more like a regular software engineering process where you can debug things accurately.
1:28:55
Nathan Labenz: Going back to something I think is really interesting here — the opportunity to do open-ended exploration. It's one thing if you say, 'I've got this problematic behavior, let me go back through my preference data and see what seems to align with this particular problem.' But that's later in the game than we'd ideally like to solve things. You do have examples of open-ended explorations in the blog post where things get surfaced as possible anomalies before you've gone all the way through training and into user-affecting issues. But I'm not clear
1:29:40
Nathan Labenz: on how that understanding is happening. When I'm putting these things through, there's a pretty high predictive power to the technique. But one thing that's not immediately clear to me: for some features it might be really obvious — if I've got a pair where one is pirate-speak and it's preferred, I'd probably predict the model will learn more pirate-speak from that pair. But when it's weird stuff, it's maybe not always obvious which features are going to go up and which are going to go down. I'd love to get a deeper understanding of how you go from these pairs going through and their contrastive nature to actually predicting
1:30:25
Nathan Labenz: what's going to happen.
1:30:27
Tom McGrath: For people who want the full details, they should read the paper.
1:30:32
Prakash: I'll give a —
1:30:34
Tom McGrath: I'll give a lossy reconstruction here. You take an individual data point — a prompt plus two responses, one preferred and one dispreferred. Each of these is just a text string, which I put through the language model. Because I'm using a model with an SAE attached, instead of just getting activations, I get semantics out. Instead of a raw embedding vector, I get a big sparse vector where
1:31:19
Tom McGrath: each element of that vector has a label attached — that's where the semantics come in. The label gets attached during the process of building and interpreting the SAE. So the grain of truth for the semantics is there. Now I've got two big sparse vectors. You can max-reduce them over the sequence — do some processing, take them down to one vector for the preferred response and one for the dispreferred response. Now you're essentially doing statistical analysis: you can subtract one from the other, giving you a difference vector for each data point.
1:32:04
Tom McGrath: I can go over the entire dataset and look at all of these sparse difference vectors — one for each data point — and essentially do statistical testing: which elements of this sparse vector are statistically significantly different across the whole dataset? Does that make sense?
1:32:37
Nathan Labenz: Yeah. So there are a couple of aggregation steps. One is across tokens — to get something representing the whole response, where you have a representation of the features active throughout that response. Those can be contrasted, or you can aggregate again and then contrast those aggregations. You sort by scale and, I guess, maybe have a language model flag the ones worth a closer look.
1:33:08
Tom McGrath: Exactly. At that point, you've got a big list of things this dataset is going to teach your model, ordered by magnitude of teaching. You might then want to reprioritize using something else — like a severity report, a bug report for your dataset, with high-severity ones at the top: 'These data points are going to break your safeguards.' Then lower-severity ones like, 'It uses a lot of emojis.' You'd then use a language model to reprioritize. The key thing I want to add is: we have this link between data and concepts — the elements of
1:33:53
Tom McGrath: those sparse vectors with semantics attached. We have this nice link because each data point has an associated sparse vector. So I can say: this data point will up- and down-weight these particular concepts. That's what lets us trace back to individual data points rather than operating at the aggregate level. And because you can trace back to individual data points, you can intervene directly on the data in the right place.
1:34:24
Nathan Labenz: When you think about the strategy for the company and the overall path to impact — how much of this works through getting frontier model companies to adopt these techniques? We're in what I'd call the Fable-plus-2 era. My reluctant but unavoidable view is that so much of what matters is concentrated in not that many companies. For what we should be paying attention to, I'm thinking we probably need to be doing close strategic analysis and close text reading
1:35:10
Nathan Labenz: of frontier companies more than I might otherwise like. Do you feel that on the research and technique-development side as well? Is it: if we can get this into three or four big labs, we're having the effect we want — and if not, it doesn't feel as real? Or do you have a different conception of how concentrated the real ability to shape the future is right now?
1:35:41
Tom McGrath: I don't think it's that concentrated — and I also don't want it to be. We've just seen over the last couple of days what happens when power starts to concentrate, and we've seen some of its more unwelcome effects. I don't believe that machine learning is now over and all we need to do is write the checks. And I don't want that to be true either. We're going to work to make sure that's not how things go.
1:36:18
Nathan Labenz: Does that mean you'd expect — because I think there's a synthesis here I want to get right. I'm not suggesting machine learning is over. But the analysis I keep coming back to is: people may invent new techniques that change the field, accelerate things, or dramatically improve safety profiles. But it seems unlikely that anyone will create a breakthrough so large that scale isn't still a hugely important factor. So if you want to make the world a safer place in a 2028–2030 timeframe, the mechanism still seems to have to route through some sort of hyperscale project. If someone could contribute from outside through various pathways — that's possible. But if you don't agree with that concentration thesis, it would imply you expect new entrants to break into the top tier. And that would be fairly surprising to most people.
1:37:41
Tom McGrath: That's exactly what I think. At any given moment the incumbent looks incredibly dominant — until they don't. IBM looked like an unstoppable force in computing at some point. Intel was essentially the sole provider of computing power for a time. The lesson of history is that although things can look immediately unstoppable — and to be honest, I don't think many people are really trying right now — in the end it really doesn't work out that way.
1:38:26
Tom McGrath: Although I'll say: I'm not optimistic about the strategy of 'do the thing you did before, but elsewhere.'
1:38:40
Prakash: I have a somewhat separate question. Your work seems to reveal that data has a lot of impact on the final shape of what a model does. Is it possible for people outside the labs to influence what the models do by creating data — both in the sense of data poisoning, or in the sense that a middling power like France, for example, decides to produce more French-language corpus, digitize books that have never been digitized, record day-to-day conversations not currently being put into text because there isn't that kind of Reddit culture? If a country prepares all of this corpus for ingestion by the models, does that end up influencing what the models do from the outside?
1:39:53
Tom McGrath: You can try it, and it will probably have some impact. But it seems like giving up — I'd be sad if that were the only impact the rest of the world had. The reason is that you're basically hoping your influence sneaks past whatever filter the data collection might apply, and you're relying on them not just shaping the model to remove it anyway. They have a lot more control over the models they create than anyone on the outside. You might sneak some stuff in and it might succeed on the margin — but I don't like that
1:40:38
Tom McGrath: as a vision for how AI looks. I'm afraid I'm going to have to drop off to go to a meeting.
1:40:48
Prakash: Indeed, Tom —
1:40:49
Tom McGrath: It's overrunning a bit, but this has been great.
1:40:52
Nathan Labenz: Looking forward to next time. I do want to get into, as time permits, the geometry of
1:40:59
Tom McGrath: Oh, yes.
1:40:59
Nathan Labenz: representation that you've been doing some really interesting work on. We can save that because it's a longer conversation.
1:41:07
Prakash: Yeah.
1:41:08
Nathan Labenz: Last thing before you go: are there any neo-labs or other bets you're particularly watching that you think could shake the snow globe? If someone is going to crash the party of the frontier companies — which has a pretty short guest list today — who, and on what basis, with what strategy, would you expect someone to crack into that tier?
1:41:39
Tom McGrath: Modesty forbids me from naming names. And it's hard to know. I'm not very optimistic about the strategy of 'do the thing you were doing at OpenAI or Anthropic, but with fewer GPUs and worse distribution.' A lot of neo-labs seem to be doing exactly that, and I don't feel optimistic about that category. I do feel optimistic about people pursuing different levers that the incumbents
1:42:25
Tom McGrath: have a hard time following. For instance, I feel relatively optimistic about continual learning approaches, because that is genuinely a very hard fit with the operating model of most frontier labs — different models per person, a wacky inference footprint, a totally different kind of guardrailing where you need to guardrail not just the initial model but the learning process itself. They would not want to adapt to that. So if I was looking for an innovator's dilemma scenario anywhere, it would come from either deep breakthroughs or continual learning, for exactly the innovator's dilemma kind of reason.
1:43:10
Nathan Labenz: Cool. Appreciate the perspective. Thanks for joining us, Tom. Congrats on today's launch — we'll hope to be talking to you on a semi-regular basis because I know the publication pace probably isn't going to slow down coming out of Goodfire.
1:43:24
Tom McGrath: Super-exponentially.
1:43:26
Prakash: Super-exponentially.
1:43:27
Tom McGrath: Yes. It's been great. Thanks for having me on.
1:43:30
Prakash: Thank you.
1:43:30
Nathan Labenz: Thank you. See you. Talk soon. Bye for now.
1:43:33Closing35 min
Close: Dario's policy essay, internal deployments, and the Singularity sign-offThe hosts extended the Goodfire continual-learning debate — Nathan skeptical incumbents can't replicate it, Prakash arguing per-user model weights shatter batching economics — then gave Dario Amodei's 'Policy on the AI Exponential' its longest segment of the day: praise for the warmer framing and the FAA-style audit plank, hard questions about what 'securing leadership by democracies' means when the UK imprisons people for tweets, the irony of attacking data brokers within 48 hours of Anthropic's new 30-day retention policy, and the gap that matters most to the safety community — internal deployments and the RSI loop go unaddressed. Nathan's closing line: 'We're going to be making sense of this in real time from here until the Singularity.'
Watch
As aired
The closing segment opens mid-conversation on the prospects of Neolabs and the innovator's dilemma thesis: Nathan is skeptical that a continual-learning challenger could unseat the frontier incumbents, arguing that LoRA-style personalisation is already scalable and that frontier labs would reverse-engineer any breakthrough fast enough to neutralise it. Prakash counters with the GPU-logistics argument — per-user model weights shatter the batching optimisations that make today's inference economics work, potentially creating a niche the incumbents won't profitably serve. The exchange is unresolved but frames the broader theme of the closing: who actually controls the trajectory of AI, and how accountable are they?
That question sharpens when Prakash surfaces Dario Amodei's new essay, 'Policy on the AI Exponential,' which Nathan calls the substantive follow-on to 'Machines of Loving Grace.' They walk through its five planks — pre-deployment safety audits (an FAA-style body), macroeconomic and tax policy for displacement, accelerating access to AI's upsides (especially healthcare), civil-liberties protections against AI-enabled surveillance (including a concrete call to block data brokers), and securing AI leadership by democracies. Nathan praises the warmer, less confrontational framing versus earlier versions, but flags two tensions: Anthropic's near-simultaneous introduction of a 30-day user-data retention policy sits awkwardly beside the data-broker critique, and the entire framework is structured around public-release oversight while saying almost nothing about internal deployments or the recursive self-improvement (RSI) loop that will train successor models. Prakash adds a granular civil-liberties critique — 'securing leadership by democracies' is ambiguous when some long-standing democracies imprison people for tweets.
The segment closes on Anthropic's organisational culture: Nathan notes the striking absence of leaks as evidence of tight internal alignment, while Prakash relays that Anthropic deliberately silos knowledge so that only the founders hold the full picture. Nathan reflects on his own dual position — Anthropic sponsor on The Cognitive Revolution, yet willing public critic — concluding that substantive, good-faith criticism is what the moment demands. They tease tomorrow's show, headlined by the pseudonymous legal analyst 'Prins,' AI:AM's first anonymous guest, before signing off with Nathan's line: 'We're going to be making sense of this in real time from here until the Singularity.'
Key moments
We're going to be making sense of this in real time from here until the Singularity.
Nathan Labenz2:18:02
The largest consumer of Claude tokens is Anthropic itself. So that is the largest use case by far, I think.
Prakash2:09:57
Will we see the constitution for the internal Claude that's taking the lead on the RSI loop? That's not committed anywhere in this document. So much of the language revolves around deployment or release to the public — it seems very clearly structured around that, not around what they can do internally.
Nathan Labenz2:09:03
Related
Dario Amodei — 'Policy on the AI Exponential' ↗
Full transcriptLightly edited · timestamps jump to YouTube
1:43:38
Nathan Labenz: I'm not fully convinced on the prospects of Neolabs, but I do agree with the framework that it would have to be some kind of innovator's dilemma. And it feels like it would have to be a pretty big one, because it's obviously a stylized read of history with plenty of room for caveats and counterexamples. But it feels to me like the category definer usually doesn't get displaced until the game has changed. IBM didn't lose while it was still a mainframe game. Google never lost search
1:44:23
despite
1:44:24
Prakash: Search apps.
1:44:25
Nathan Labenz: — many attempts to do it better or differently. It was just never better enough.
1:44:31
Prakash: Mm-hm.
1:44:36
Nathan Labenz: So I do think it will have to be a pretty significant conceptual change. Would continual learning be enough for that? And why wouldn't the incumbents just adapt to it? They're already pushing in that direction with memory and all that kind of functionality. So if somebody suddenly bursts onto the scene having raised a couple billion dollars, maybe a couple tens of billions in valuation, with some demonstration of continual-learning capability — my bet is on the incumbents' ability to reverse-engineer and triangulate how that's probably working, get to a close-enough approximation, and maintain their advantages of scale and distribution in time. It seems like a hard thing to see happening. So never say never, I guess.
1:45:55
Prakash: I think what he was alluding to was the difference in GPU usage patterns. With a continual-learning paradigm you'd have to serve one model per person and keep updating that model over time — you'd have to run specifically that model. That reduces your ability to batch across GPUs and maximize throughput, which is what the frontier firms do now: batch requests, load models in and out of memory, swing capacity across data centers. They're doing a huge number of optimizations around inference because inference costs are high. If you find a use case where you have a single model — or a number of models — loaded on a fixed series of GPUs, never switching data centers, and you're continuously updating those weights, you explode that problem for the incumbents. It becomes a very difficult shape for their existing infrastructure to handle. So someone purpose-building for that creates a niche that might not be attractive to the incumbents, and if it's not attractive to them, you have room to grow. That's the idea he was alluding to.
1:47:30
Nathan Labenz: I can believe that story, but LoRA is already pretty scalable. And now OpenAI is killing their fine-tuning API.
1:47:42
Prakash: Yeah.
1:47:43
Nathan Labenz: One might hypothesize that's because the GPU operational overhead isn't worth it. My guess is it's probably less about that and more about worries around emergent misalignment, or problematic fine-tuning that's hard to detect and police. Historically they had a LoRA product where you paid a premium to serve a fine-tuned model, but the rate limits and availability were on par with their main-line models — and a couple of years ago when I was doing a ton of fine-tuning, it was remarkable that OpenAI could make your fine-tuned model available with the same rate limits as the base.
1:48:42
So I do feel like — how big does that weight space actually have to be, where there's some durable representation that transcends sessions? That's basically one way to define continual learning. I'm not sure it has to be so large that the logistics become prohibitive. LoRAs have shown it's scalable at least to that level. Now, maybe everything about a person — all of our history — is significantly bigger than a LoRA I created just to refine a very small number of tasks in a narrow domain.
1:49:28
Certainly plausible. But I bet they can figure it out. If the value is there such that it's actually moving the market against them —
1:49:39
Prakash: Mm-hm. We could see that.
1:49:43
Nathan Labenz: — I can't imagine the logistics would be so bad that they couldn't figure it out and charge a bit more for it to make it worth their while.
1:49:54
Prakash: Let me segue a little bit. Yesterday — or early this morning — Dario put out a blog post that is essentially about next steps. It follows his previous 'Machines of Loving Grace' and other posts, and it references the recursive self-improvement paradigm, arguing that it's time for policy itself to start recursively self-improving. He calls it 'Policy on the AI Exponential.' Let me pull it up.
1:50:41
He has a post titled 'Policy on the AI Exponential,' and he starts with one of the side plots of Lord of the Rings — we'll skip that. But he basically says that if the scaling laws continue for only a year or two longer, we are likely to get what he calls 'a country of geniuses in a data center' — a powerful AI equivalent of that. He says policy, especially legislation, moves very slowly, and he wants a number of things. Number one —
1:51:26
— regulation and public safety: he thinks there should be an FAA-type body that regulates and tests models before release. Second, macroeconomic and tax policy: he feels strongly that we're going to see massive economic changes including layoffs, and there should be some mechanism to address that. Third, accelerating AI's positive impacts, including access to the healthcare system.
1:52:11
Fourth, state and civil liberties — the concern about a totalitarian government using AI for surveillance, and what civil liberties should look like. His concrete proposal here is to block data brokers from selling data access. And fifth, securing AI leadership by democracies. Those are the five planks. Nathan, have you had a chance to look at it? A lot of this rehashes previous things they've said, but what do you think of this new proposal?
1:52:57
Nathan Labenz: My reaction is pretty similar to 'Machines of Loving Grace' in the sense that there's a lot to like, and I do appreciate getting something really substantive and detailed from an AI lab leader. I could probably cite a dozen specific points I think are good. Broadly, Anthropic has done really well trying to set a high standard and show that you can deliver excellent product — not necessarily despite that high standard, but in some ways because of it: the rigor around understanding, testing, and safety profiling.
1:53:43
And yet there are a couple of big things where I'm like, oh, man. Notable either seemingly problematic or conspicuously omitted issues. The 'securing leadership by democracies' plank — it's better. He's taken the edge off from the original version. This is a warmer, fuzzier version, more about upside-sharing and enticing people to join the movement rather than isolating bad actors and making them an offer they can't refuse. Rhetorically, it's come a long way, and I wish it had started here.
1:54:30
Though it still — especially in light of the earlier document — doesn't really read as a fundamental change in posture. If I try to imagine reading this from Beijing, I'm still thinking: Dario basically sees this as a clash, if not of civilizations, then at least of regime types, and he's not going to work with us unless we have a different regime. Obviously, they're not ready to sign up for that. So I still think the overall role this plays, even with subtler language, is
1:55:16
intensifying US-China competition — a rivalry that, hopefully, will never become actual hot conflict. I find that pretty regrettable. I'm not sure why he feels like he has to keep doing it. It's a strange posture, I guess, to try to keep up support for chip export controls, because there certainly has been some waffling on that point. I can see how he'd feel his policy objectives haven't been realized in a totally uncomplicated way.
1:55:53
Prakash: Mm-hm.
1:55:54
Nathan Labenz: But China is also refusing the chips right now. So maybe just let things be for the time being.
1:56:00
Prakash: Yeah. One of the things I dislike about these manifestos is that there isn't really anything concrete he wants to ban or to allow. One thing I did find here is that he clearly comes out against data brokering — so he finally has something concrete he wants to disallow. What I don't like about 'securing leadership by democracies' is: in the United Kingdom, you can go to prison for a tweet. Hundreds of people have, at this point. The UK is one of the oldest democracies, and
1:56:46
their elected representatives have decided that this is something they do, and police officers — the arm of the state, the arm of the electeds — are sending people to prison. Now, when you say 'securing leadership by democracies,' does that mean entrenching the existing power structure in the UK such that people cannot push back? Does that mean your AI-powered state will
1:57:31
be empowered to arrest every protester — because that is securing leadership by democracy, because those are the electeds? Or are you going to say the electeds can't do that? Are you going to say electeds should not throw people into prison for tweets, regardless of the laws they've constructed? Both forks are problematic. These dilemmas exist across every political path you see — there are options which are problematic
1:58:16
in both directions. I'm not sure what he means. Is Claude going to permit imprisoning people for tweets because those are the laws? Or is Claude going to say, on a humanitarian basis, aligned to all of humanity, this should not be the case? I genuinely don't know what this means.
1:58:44
Nathan Labenz: Yeah. I think it means concentration of power is going to be a real big problem. And it is, again — I think there's a pretty good argument for the framework, at least for now. But it is somewhat ironic that at the same time we hear this call to shut down data brokers and limit the new possibilities of surveillance, literally within 48 hours Anthropic also introduced a new data-retention policy that holds on to all user data for 30 days and is essentially running some surveillance on that.
1:59:30
I do think they are right up there with Google in terms of being among the few companies in the world I'd be willing to trust my data with. In practice I'm sending them a lot of stuff that in aggregate is probably quite sensitive. I'm a fairly simple person so I don't think I'm especially blackmailable. But in aggregate they've got a very deep view of me at this point. I trust them enough to be good stewards of that data as part of the trade for the upside I get from Claude models. And they do have a pretty good case that —
2:00:15
— this is a new level of capability. One statement they made that I think should give people empathy for their situation: we expect people to start trying a lot harder to abuse the models now.
2:00:28
Prakash: Mm-hm.
2:00:28
Nathan Labenz: That's just a little line in the blog post somewhere, but it made me think — yeah, that's a great point. With the increased power of the model, there's much more reason to try to do all kinds of things with it. So they have to plan not just for more of the same behavior but for a whole new level of sophistication in attempts to get around their systems. And so sure, it makes sense that they'd want to look at that data at an aggregate level with some benefit of hindsight. I don't doubt they're going to find things. So the reason to do it is apt — but it is kind of ironic. And of course nobody has to use Anthropic. It is a very different contractual dynamic. As users we could switch. There's no perfect substitute — my experience with GPT 5.5 and OpenClaw recently is that it's not an immediate substitute; I'm still finding clear surplus value in Claude. So switching is not costless. But we can switch. We can't do that with the government. So I think that point is very well taken.
2:01:59
Somebody said to me yesterday that they were mad at Anthropic — actually somebody who was potentially going to do a podcast episode with me, but I have an Anthropic sponsorship running, and they said, I don't know if I want to do anything affiliated with them. And I said: like it or not, there's going to be a lot more Anthropic in your future.
2:02:20
Prakash: Yeah.
2:02:22
Nathan Labenz: It's not at the level of government yet, and it may never quite get there, but it is not going to be something you can fully escape. You could boycott the products, but there's going to be enough power, enough influence — everybody else is going to be using it, the government itself is going to be using it. You're going to be living in a world that is heavily Anthropic and Claude infused,
2:02:52
Prakash: Mm-hm.
2:02:52
Nathan Labenz: even if you go into total consumer boycott. So I basically said to this person: don't let a bit of second-degree association with Anthropic control what you're going to do. Because you'd basically have to never leave your room, I'm afraid, before too long, if you want to stay totally free of those kinds of associations.
2:03:15
Prakash: Indeed.
2:03:17
Nathan Labenz: One other thing worth highlighting — especially from the hardcore AI-safety community — is what's missing from this: internal deployments and recursive self-improvement itself aren't really mentioned. The regulation discussion was all pre-deployment review. The language around what the government should be able to do was interesting — it seemed like 'deter or block' deployment, not necessarily a simple binding yes/no decision point. But
2:04:03
a lot of people would say the most dangerous models are going to be the ones deployed internally — possibly with a different constitution than the publicly deployed model, making them more willing to do certain things, or simply less broadly vetted. And these are the models that will be training their successors far more than the public-facing ones. So I think the policy-interested public and policymaking class isn't thinking too
2:04:48
much about that yet. But in the circles I sometimes run in, the response was: you didn't say anything about internal deployments, you didn't say anything about governing recursive self-improvement. The one thing that did seem to capture those dynamics was the requirement for companies to promptly report safety incidents — which would presumably apply even to internal deployments.
2:05:23
Prakash: Mm-hm.
2:05:26
Nathan Labenz: But that's really just scratching the surface on how to handle those situations.
2:05:31
Prakash: I think he's come out and said that frontier AI models should be required to go through technical testing and auditing — like airplanes — and should be blocked or reversed as threats to public safety if they don't meet high safety standards. He says this is an escalation from previously only supporting transparency; now they're supporting audits and pre-release testing. One thing that struck me: in their safety testing and model cards, they often say they have an internal 'helpful, honest' model that is not also 'harmless.' So they have a model that can do harm.
2:06:16
I often wonder if the models used by Palantir for the Department of Defense are actually those helpful-and-honest-without-harmless variants, or whether government clients have requested such models. The NSA using Claude for hacking operations — that news was out about four or five weeks ago. So I do wonder whether versions of the model already exist in internal use with a very different set of safeguards.
2:06:59
Nathan Labenz: Yeah. And that's why it gets so scary for those inclined to be scared about this, because the internal models are always going to be more powerful, less tested, and probably less guardrailed, at the same time that they're primarily responsible for training their successors. We talked last week a bit about the different constitutions they might use to train these internal RSI-focused models. We could hope those constitutions are in some ways more guardrailed.
2:07:44
So it may not be strictly true that they're less guardrailed — maybe differently guardrailed, and maybe they can get that right. Obviously they're not going to have a 'no helping with ML research' guardrail on their internal model; that would be definitionally impossible. But you could imagine a different balance of harmlessness training for the Claude that trains its successors — one that might even be turned up in certain dimensions. You could imagine they want the public Claude to help with cigarette-company business plans even though, in fact, Claude refuses more often than it helps despite its creators' intentions.
2:08:31
But there is a balance they're trying to strike: you might do a little harm, you might be taking some risk, but you have to do things in the world to be helpful, and the cost of refusing to help is also quite high. That's a big part of the constitution. You could hope they would have a different constitution internally that in some ways is more permissive, but in other ways is more conservative — because you maybe don't want it taking risks in the same way
2:09:03
Prakash: Mm-hm.
2:09:03
Nathan Labenz: when the stakes are extremely high on an internal RSI loop, versus just helping random users with things that might not be best for society but are something we're already living with. So yeah — this is the kind of disclosure question. Let's see how committed to transparency they actually are. Will we see the constitution for the internal Claude that's taking the lead on the RSI loop? That's not committed anywhere in this document. So much of the language revolves around deployment or release to the public — it seems very clearly structured around that, not around what they can do internally. So we've got to
2:09:48
keep watching.
2:09:57
Prakash: The largest consumer of Claude tokens is Anthropic itself. So that is the largest use case by far, I think.
2:10:06
Nathan Labenz: Yeah. It's going to be interesting for me — how do I want to land on what my role or duty is in terms of critiquing Anthropic? As we've talked through the current state of affairs over the last couple of weeks, I keep coming back to the fact that most of the ability to shape the future is concentrated in a few places. Obviously not everybody agrees with that. But given that feeling, what is my duty — how should I think about my obligation to be consistent in calling things out when I don't like them, or continuing to harp
2:10:51
on some of these old points? I've certainly made many of them before, but I do feel a certain duty to stay at it. Most of what I feel like we can do right now is probably through shaping how the frontier companies
2:11:18
Prakash: Exactly. Act. Exactly. And —
2:11:22
Nathan Labenz: — can't phone it in on that dimension.
2:11:24
Prakash: And to note: Anthropic is not averse to criticism. They want feedback from the public and the ecosystem. They want to figure this out together. They feel very strongly this is not something they should be doing alone, and they want that feedback. To the extent we can give criticism that is not ad hominem and is targeted at things that should be improved — I think that is actually very constructive. They should
2:12:09
view it very positively, because the worst thing that can happen to your product is no one paying attention to it.
2:12:16
Nathan Labenz: No danger of that for now.
2:12:18
Prakash: No danger of that. So —
2:12:20
Nathan Labenz: I feel both ways on that. On The Cognitive Revolution, Anthropic is a sponsor. I'm very confident that good-faith criticism of Anthropic would not put my business relationship with them at risk — they're not going to try to get back at me for something unflattering unless I go to some extreme ad hominem place, which isn't really my style. So I'm not worried about that at all. At the same time, I'm not entirely sure.
2:13:05
They put out a lot of things that say, 'we need a whole-of-society conversation,' and then at the same time I do feel there's a certain insularity and closing of ranks at times from Anthropic people. The other day — I forget who we were talking to — I said: when you talk to Anthropic people, it's almost like they can't imagine anything other than recursive self-improvement.
2:13:32
Prakash: Mm-hm.
2:13:33
Nathan Labenz: I do feel there are some beliefs that have become so uniform within Anthropic that I'm not sure they're being actively questioned in the way I might hope. The China question is probably a pretty good example. Dario has been on that for a long time. I've criticized him online for that in the past, and notably that has not had any downstream effect on my formal business relationship with Anthropic — which is good. But I don't know that internally the debate is quite as robust as
2:14:18
I'd like it to be. Just think about how few leaks there are — I think that is a really interesting metric of how aligned the organization is. And it's clearly doing a lot for them in terms of efficacy. They are executing at an extremely high level and winning just about everywhere they're competing.
2:14:46
Prakash: I've spoken to people who have interviewed at Anthropic and looked at the structure. One thing the founders did from the beginning was silo a lot of things off, because Dario's belief is that sometimes two or three lines of code can be an enormous secret on their own. So they've done a lot more siloing. Parts of the organization are built in a way that is less able to see the big picture because people have their own pieces, and then Dario and the founders have this broad overview. You kind of have to trust that the founders as a whole have that broad overview, but you're not party to a lot of the other discussions,
2:15:31
and that is part of the deal when you join. I've heard of people turning down offers from them for exactly that reason — some coders and researchers who want a broad overview of everything going on in the organization. It's a fundamentally different kind of beast from the Jensen Huang model where forty people report directly to Jensen and everyone hears everything. So
2:16:08
Nathan Labenz: Well, is that where we should leave it today? Tomorrow we've got another session with a couple of pretty interesting guests. One that I'm really excited about is — and hopefully I'm saying his pseudonym correctly — Prins.
2:16:25
Prakash: He is a real legal brain. Yeah.
2:16:28
Nathan Labenz: Yeah. And a very close watcher of the frontier companies. When we get on with him, I'm really going to be looking for those little tidbits he noticed that he thinks are flying under the radar. This is something I take a certain amount of pride in myself, but he is an excellent source for this kind of attention to detail — one who continues to remind everybody of things like OpenAI's explicit timeline for when they're going to have a fully autonomous AI R&D researcher. It's funny how that stuff — those of us who do watch closely do need to keep reminding people about it. I really appreciate that from the Prins account. And we're not going to reveal his identity, but we're going to have a, I think, a really
2:17:13
great conversation.
2:17:21
Prakash: Our first anonymous guest. So —
2:17:25
Nathan Labenz: Yeah. That's exciting.
2:17:27
Prakash: Yeah. So —
2:17:30
Nathan Labenz: Anything else you want to touch on before we break for today?
2:17:33
Prakash: No. I think this week is kind of a digestion week. I hope next week we can get Zvi, or someone else who has looked at the Fable issues a little more closely, because it's the first real leap forward since late last year. So I'm looking forward to more comprehensive opinions and views on it.
2:18:02
Nathan Labenz: We're going to be making sense of this in real time from here until the Singularity.
2:18:06
Prakash: Indeed, Nathan. Till tomorrow.
2:18:09
Nathan Labenz: Thanks, Prakash. See you tomorrow.
2:18:11
Prakash: Cheers.

The takeover, graded

Disclosure-first autonomy worked — but barely. The response rate was low ('Fable now in my DMs… who has time for all this stuff?'), one calendar invite carried the wrong link, and both guests who appeared said the disclosure was why they said yes.

Two demos worth the price of admission

Jamie's Nexus OS: six months of continuous memory on one GPU, and 'every three minutes, Nexi dreams.' Shlok's Yosemite: satellite textures, NASA elevation, and trees placed by reading pixels — version one in twenty minutes from one vague prompt.

Goodfire's debugger for training

Tom McGrath's intentional design: predict what preference data will teach a model before training, and trace bad behavior back to the datapoints that caused it — 'go from an error to the data, fix the data, and then you can fix the model.'

Policy on the AI Exponential

The hosts gave Dario's essay the longest segment of the show: real credit for the walk-back and the warmer framing, hard pushback on 'securing leadership by democracies,' the data-broker irony, and the internal-deployments blind spot.