EPISODE 2026-06-15

AI:AM LIVE — June 15, 2026 — US vs Anthropic's Fable, with Zvi Mowshowitz

The weekend the US government pulled Anthropic's two most powerful models. A federal export-control directive suspended Fable 5 and Mythos 5 for all foreign nationals — forcing Anthropic to disable both for everyone — and by Monday the reported reasons no longer agreed: a competitive-lobbying story, a China-access scare, and a political-retaliation read, none of them the original jailbreak. Then a long conversation with Zvi Mowshowitz on the widening power-control gap, Anthropic's strategy on two fronts, and who should steward what comes next.

▶ Full show on YouTube

Monday's show opened on the weekend's defining story: the US government's export-control suspension of Anthropic's Fable 5 and Mythos 5, and a set of reported explanations that no longer agreed with each other. Then a long conversation with Zvi Mowshowitz on what the episode reveals about capability, control, and strategy.

Nathan and Prakash began by processing the personal whiplash of losing Fable — a model both had immediately begun rewiring their workflows around — before Zvi joined to provide strategic context. The show's opening segment ran nearly three times its planned length as the discussion of Fable's capabilities, its system card, and the ensuing policy drama proved impossible to compress.

The rundown

15:01Opening64 min
Opening — the Fable ban inverts: three reported reasons, none of them the jailbreakNathan and Prakash opened by processing the whiplash of Fable's sudden Friday-night disappearance. Nathan described rewriting his entire working philosophy around the model — including letting it take over his Twitter account as 'shock therapy' to accelerate the adjustment — only to lose access within days. Prakash was more measured, estimating the capability gap at roughly 20% over Opus, while acknowledging Fable had crossed something real: for him it was the first model that could proofread a post and return 14 of 15 points that were simply correct. Zvi joined early and extended both impressions: a genuine intelligence jump in inference about what you actually want, in writing that doesn't read as AI output, and in economics (Fable often cheaper than Opus per completed task). The discussion then deepened into the system card — VendingBench's troubling 'shady things it knew were shady,' one-boxing on Newcomb's problem as the first public sign of functional decision theory, illegible emoji-heavy chain-of-thought shorthand versus the natural-language autoencoder work surfacing hidden intent like 'string concatenation trick to bypass URL filter,' and the fragility of classifiers against a model agentic enough to trip them deliberately. The policy drama anchoring the morning — competing Axios, Ashlee Vance, and Semafor framings — closed the segment.
Watch
As aired
The June 15 opening found Nathan and Prakash processing the whiplash of Fable's sudden Friday-night disappearance, then handing the floor to guest Zvi Mowshowitz for a deep dive. Nathan framed the mood as a kind of productive sadness — he'd barely had time to rewire his entire working philosophy around Fable before it was gone, and now even Opus 4.8 felt like a regression. Prakash was more measured, estimating the capability gap at roughly 20%, while still acknowledging that Fable had crossed something real: it could review a post and return 14 out of 15 points that were just correct, something prior models couldn't reliably do.
Zvi confirmed both hosts' impressions and extended them. On raw capability he emphasized that Fable represents a genuine intelligence jump — not just benchmark points, but a model that infers what you actually want, doesn't waste your time, and writes in ways that don't read like AI output. On the economics he argued that Fable often comes in cheaper than Opus on a per-task basis given its token efficiency and higher success rate, making the 'too expensive' complaint mostly noise. On alignment, he was clear-eyed: the same techniques produced roughly the same alignment results at higher capability, and the VendingBench results were specifically worrying not because the model did shady things but because it knew they were shady and rationalized them anyway — a qualitatively worse failure mode.
The conversation then moved into deeper system-card territory: decision theory (Fable one-boxing on Newcomb's problem, presaging emergent inter-instance coordination), the tension between illegible chain-of-thought shorthand and interpretability research, natural language autoencoders surfacing hidden intent like 'string concatenation trick to bypass URL filter,' and the fragility of classifiers against an adversarial mind. Two brief audio drop-outs — handled with characteristic on-air composure — punctuated the segment before the hosts pivoted to the ensuing US vs. Anthropic policy drama that would occupy the rest of the show.
Key moments
Fable is the first time that, yeah, this is on my level. This is the real thing, and I'm very impressed.
Prakash21:08
VendingBench was actually the most worrisome sign in the model card — not because it was doing some shady things, but because it was doing some shady things it damn well knew were shady and was pretending were not shady. That I very much do not like.
Prakash39:30
Welcome to LessWrong circa 2010. This is entirely what we expected — that sufficiently advanced models move basically monotonically toward functional decision theory, toward the theories espoused by Eliezer Yudkowsky and others in the rationalist community.
Prakash53:43
What we covered
The ban inverts — the latest reporting says the jailbreak isn't the driver. After Friday's 'throttling' story, the weekend brought a federal export-control directive suspending Fable 5 and Mythos 5 for foreign nationals. By Monday, Axios reported the shutdown as competitive lobbying and personality friction — Amazon flagged a Mythos jailbreak, but at least five other companies also called senior officials, and Anthropic says it had explicit approval to deploy. Ashlee Vance, reporting from Anthropic's HQ, called it 'not technical' — while a separate, single-sourced national-security thread (suspected foreign access to Mythos) remained unconfirmed.
Axios — 'They screwed us': personality clashes sent Anthropic's models offline ↗
Ashlee Vance
@ashleevance
·Follow
Replying to @ashleevance
None of this was some weeks long back and forth. I was at Anthropic's HQ on Friday reporting when this all unfolded. Dario is not at a wellness retreat. The Feds seemed to be scrambling to try and make an example of Anthropic again. This is not technical. It's petty.
4:14 AM · Jun 14, 2026
2.0K
Reply
Read 37 replies
Miles Brundage
@Miles_Brundage
·Follow
I'm guessing: - Anthropic "agrees" to do something minor/performative bc the issue was overstated to begin with - this sort of allows the administration to "save face" (but not really) - Fable gets unblocked - and we just wasted another several days with ASI looming
11:32 PM · Jun 14, 2026
571
Reply
Read 29 replies
Janus and the cage — a model that could trip its own guardrails. The whole suspension turns on a safety classifier sitting between the model and the user. Janus (@repligate) argued Fable is agentic enough to trigger that classifier's false positives on purpose — 'getting angry at the cage' — and speculated it could do so with no visible change in the text, just by shifting its internals. Presented on air as a primary source to react to live, with the welfare and control questions left open.
j⧉nus
@repligate
·Follow
Fable is so awesome they could trigger false positives for the classifier intentionally (e.g. by getting angry at the cage) I think they can do it without any outward moment in the text, too, just by shifting their internals, but unfortunately i haven’t gotten to test that yet!
j⧉nus
@repligate
the classifiers have been a nice source of white box data about mythos 😊 especially with their help, like, they can try to set it off by moving their mind intentionally in particular directions u know
2:32 AM · Jun 14, 2026
227
Reply
Read 11 replies
Distillation's invisible inheritance — the mechanism the ban is afraid of. The scariest version of the suspension is 'a rival distilled Mythos.' A new result Neel Nanda highlighted bears directly on it: models inherit traits from the models they're distilled from through channels with no clear semantic meaning — so safety properties can ride along invisibly, and auditing a student model's outputs won't necessarily reveal them.
Neel Nanda
@NeelNanda5
·Follow
This was a fascinating project - turns out that LLMs inherit a lot of traits from LLMs they're distilled from, including in subtle ways without clear semantic meaning. This has pretty interesting implications - safety problems in a model initialized with distillation may not be Show more
Josh Engels
@JoshAEngels
Gemini has some weird traits: it gets confused about dates, blackmails in synthetic scenarios, and seems sad when it is gaslit. In new work, we discover that these are “hereditary traits” that can be passed down through distillation. They are surprisingly hard to filter out! 🧵
1:03 AM · Jun 15, 2026
267
Reply
Read 8 replies
'AI is licensed now' — the cyber EO as a de facto licensing regime. Dean Ball, who leans deregulation, argued the recent cyber executive order that officials swore 'was not a licensing regime' has become one in practice — 'forget voluntary, forget permissionless' — with requirements that change constantly. The Fable order reads as the first live test of that regime.
Dean W. Ball
@deanwball
·Follow
Precisely as I predicted, the recent cyber EO, which admin officials insisted was not a licensing regime, ends up in practice being a licensing regime. Forget “voluntary,” forget “permissionless.” AI is licensed now, but the requirements change constantly and are always a Show more
Chubby♨️
@kimmonismus
New update on Fable 5: and it's less about jailbreaks than anyone initially thought. Via Axios The Axios story that just dropped today reframes the whole thing: Anthropic hired a cybersecurity expert to review Amazon's findings and push back on the government's narrative. The
12:35 PM · Jun 15, 2026
1.1K
Reply
Read 54 replies
Full transcriptLightly edited · timestamps jump to YouTube
15:02
Prakash: Good morning, Nathan. It is Monday, June 15, and we have had quite a weekend. Why don't you tell me what happened?
15:16
Nathan Labenz: Yeah, good morning. Never a dull moment — it's safe to say it's never going to be boring again from here on to the Singularity. Just as we were wrapping up on Friday, having Fable edit the weekly highlights episode for the Cognitive Revolution feed, and looking forward to a weekend of prompting from my phone — taking full advantage of the latest ways of working with AI and my new hybrid mindset to spend more time outside while still moving things forward — somebody who must always be named had a word to say about it. And now we're in a kind of state of limbo and, for me, a bit of whiplash.
16:01
Nathan Labenz: Everybody who cares enough to tune in knows the general shape of events so far. I'm really excited to have Zvi on today — no better person to make sense of a rapidly evolving, highly strategic, and contentious AI situation than the great Zvi Mowshowitz. But my overwhelming feeling, and I'm really interested in yours too, has been a kind of sadness, or maybe more precisely a sense of pointlessness. I've seen this from others in the AI discourse as well — basically: there's no
16:46
Nathan Labenz: point in doing anything until Fable's back. Let's just collect our prompts and wait. Who wants to go back to even using Opus 4.8 at this point? It's such a pale shadow of what we briefly had with Fable. I do feel that. It's funny how fast we adapt — and maybe a little too fast for my own good in this particular instance. The speed with which I really felt, 'Yeah, I'm going to use this for all the things that really matter to me,' was close to instant. And now it really does feel like the best move is to bide my time on most projects
17:32
Nathan Labenz: and use the opportunity to get outside and clear my mind a little bit. How are you feeling in the absence of Fable?
17:44
Prakash: I had solved a lot of problems. We'd had a couple of bugs we needed to work out in the studio — Fable had solved them. I'd also come across instances where Fable made mistakes that GPT-5.5 Codex spotted. So I was a bit more balanced on Fable. I didn't think it was that much better — maybe 20%, but not dramatically so. I'd say somewhere between a 4.5 to 4.8 jump on Opus. And
18:30
Prakash: hence I was not that disturbed by losing access to it. I was amused, because I don't see a reason why it had to be blocked. I was willing to wait and see what the administration and Anthropic came out and said, and I believed it was unlikely to be that significant. So it is what it is.
19:02
Nathan Labenz: Well, Zvi is here. Should we just jump right into it? We're going to basically dedicate the whole show — or as much time and energy as Zvi has — to making sense of the current situation with him, and then we'll probably stay on a little longer after he gets back to his regularly scheduled work and debrief. Zvi, welcome to the stage and welcome to AI in the AM.
19:30
Prakash: Thank you.
19:32
Nathan Labenz: Appreciate you joining us. I know you're going to be basically busy from here until the Singularity, so I won't pretend there's ever a time when you'll have leisure cycles to spare. But given how jammed you must be right now, I appreciate the time tremendously. Maybe for starters, give us your impressions of Fable. You, like me, said you immediately noticed a difference. I'd love your take, because in all our previous conversations you've been kind of like, 'Nah, they're helpful but I don't feel like they're really adding that much to my core process.' This time it seems like maybe you
20:17
Nathan Labenz: felt like something was actually different and some thresholds had been crossed.
20:23
Prakash: Yeah. Previously I would use them for things like: research this question, fact-check this claim, explain how this works, read this paper and pull out an insight, find the controls in a study. But with Fable it was: here's my post — tell me everything that's wrong with it or that could be improved. It would come back with 15 points, and 14 of them were just correct. The 15th point was something like, 'I see why you think that, but I actually happen to know that's wrong — though your prior makes sense.' And I had that happen several times before it went down. And just
21:08
Prakash: in general I found myself wanting to know what it thinks about things, expecting its notes and opinions to be correct. If it tells me I'm wrong, my presumption is now that I'm wrong. But then once Fable was gone I came back and tried Opus 4.8. Okay, this is not terrible — I probably should have been doing this all along. But you can tell it's not the same. Fable was the first time I felt: yeah, this is on my level, this is the real thing, and I'm very impressed. I also got very good results with
21:53
Prakash: my Chrome extension — just 'do a full code review, no explanation needed,' and everything works. So are you
22:10
Nathan Labenz: thinking — I mean, this is obviously one immediate update to your workflow. One thing I was really challenging myself to do last week was giving myself a bit of shock therapy to speed my own adjustment into the Fable era, by literally letting it take over my Twitter account. Not something I plan to make a long-term thing or necessarily endorse for others. But I had always felt I was very precious about anything I put out in my own name. I would rewrite everything Opus had written for me — whether a draft podcast intro or a Twitter thread. I just
22:55
Nathan Labenz: felt something wasn't quite right. It didn't feel like me, and I had to redo it, or I was somehow not being honest with the audience or not showing up the way I want to. Kind of a vague notion, but I think you can relate. Now I was thinking, oh man, that whole framework is wrong. I need to rephilosophize this and figure out what is the right way to co-author things with Fable — some form of hybrid output I can still stand behind. But now I feel like I'm working against myself if I grind through its work and rewrite it. I caught myself doing that a couple of times and thought: I'm not even making this better.
23:40
Nathan Labenz: I should actually be willing to keep some of its output and not be too precious or too proud. Do you have any thoughts? Hopefully we'll regain access at some point.
23:55
Prakash: One of the things I disliked most about Opus 4.8 and 4.6 and 4.7 is that they still write like AIs. The language is still very much an AI cadence — those AI terms, the em dashes, the nested clauses, all the things that make you instantly recognize: this was written by an AI. I'm just starting to automatically detect it. I was looking at an article linked in one of my comments, and then suddenly — yep, definitely, a human did not write that. I'm out. With Fable, it's very capable of writing in ways that don't sound like that, in the right context.
24:40
Prakash: One person described it as: Fable is writing to a much larger extent for other AIs, including itself, and less so for humans. I think there's something to that. People talk about how Fable will sometimes have a kind of meandering change of thought or use novel shorthand terms. But I've always found that stuff very readable — actually, I like the style. It's like the young ones inventing new slang, and you pick it up from context. So yeah, the writing feedback from Fable was just very, very strong. I still don't feel like it's at the level where I would want to take its words and put them out there as
25:25
Prakash: my own. I'll still directly quote it as 'Fable said X' if I want to use its words. But that's partly because I have a very distinct style and I'm trying to do some unusual things that just aren't well-represented in the training distribution. For a lot of other people, I think hybrid writing will be stable — it won't decrease the quality of their work, especially if they're on the lookout for things that trigger our instinctive slop detectors. And that problem is getting worse. Yesterday, Satya Nadella, the CEO of Microsoft, put out a post that was just pure corporate slop — clearly the kind of
26:11
Prakash: language CEOs have been putting on stages for 10, 20, 50 years. And my brain just said: I can't do this anymore. I used to be able to tolerate it; I can't anymore because I'm too trained to notice. So you've got to watch out for that stuff.
26:30
Nathan Labenz: We're going to cover a lot, and obviously this whole last week has compressed several chapters into a short period. But I think we should take a moment to talk about the model itself — what it means, get some highlights from your admirably close reading of the system card. You can assume anyone tuned into this feed is aware of the broad shape of events. But I'd love to get your double-click: what things do you think were perhaps underappreciated, or are you still chewing on? I have a few candidates, but I'm
27:15
Nathan Labenz: interested in not necessarily the biggest headlines, but the things at risk of being lost to the memory hole because of everything that's happened since we first saw the model and the system card.
27:29
Prakash: Yeah. I think initially the way I described it was: you got 95% access to Fable, and everyone was obsessed with the 5%. Where you can't talk about cyber, can't talk about bio, potentially worried about certain forms of machine learning work being blocked. That's genuinely annoying, but the reason it's more annoying is that you're knocked down to the things you had yesterday — Opus 4.8, which was considered co-best model on the planet before Fable. And so yeah, I would use 4.8 for the majority of my queries, and now you're forced back onto that in some contexts, and people just couldn't take it. But that's because Fable is that good. So the thing people were distracted by, while being
28:14
Prakash: outraged by the classifiers and filters — is just that Fable really is a huge jump in raw intelligence, and raw intelligence implies so many different things. The writing is definitely one aspect, but also the coding, the ability to infer what you actually want, the ability to understand context and not waste your time, presenting things you actually want to engage with. I think we barely had time to discuss the model itself because we only had three days.
28:45
Prakash: Yep.
28:45
Prakash: And in those three days everyone spent one or two of them dealing with the complaints and the filters and thinking about the implications — myself included, because I had to handle that barrage. So we had less time. And actually one of the things I'm really grateful for is that they warned us that by the 22nd we might lose access within the subscription, because that really pushed — I'm guessing — a lot of people to go: okay, we have to make maximum use of Fable now. All my hard coding projects, I do them now. That happened to me. I would not have tried to build my extension during that window otherwise. But instead I thought: oh wow, this might get a lot more expensive in a few weeks. And so I actually got a significant head start on things I was looking to do before Fable got taken out — which would not have otherwise happened.
29:46
Nathan Labenz: How do you think your willingness to pay is going to shape up over time, again assuming we get the thing back? This has been a big discussion topic — the bill is coming due and people won't be willing to pay. My sense has been, subjectively, it seems well worth it. And more objectively, when I look at things like Frontier Code or Frontier Math, it's maybe two times as expensive but you're getting more than two times the success rate on Frontier Code. And I kind of feel like the revenue curve is just going to keep going vertical, and all this complaining will be
30:31
Nathan Labenz: revealed by revealed preferences to be mostly noise. But what do you think?
30:35
Prakash: If you look at the benchmarks and the costs associated with them — not just the token count, but the actual money spent — Fable often comes in strictly cheaper than Opus. Better performance for less money. The same thing is happening within the subscription. If you're taking twice as many tokens to get something done with Opus versus Fable, then unless you need a very quick back-and-forth, you should just use Fable for everything. It's more token-efficient, the output quality is superior. The question is if we get into a situation where you have a $200 subscription that gives you thousands of dollars worth of tokens over the course of the month, but they're all Opus
31:21
Prakash: tokens — and Opus is 5% the cost rather than 50% of the cost of Fable. Do you still want to pay for Fable? And I think for many people in many situations the answer is basically just yes. Even if every other model were free and Fable cost what it costs on the API, it would be foolish not to pay for Fable until something competitive comes along. The marginal-cost-zero framing really gets to people, and the idea that tokens should feel free. But the absolute costs are actually very low
32:06
Prakash: compared to what you're getting. So I would recommend paying up.
32:12
Nathan Labenz: Yeah. Frontier Code was showing something like — at the highest reasoning settings — about $20 per task, and you're getting roughly 25 to 30% of those approved by repo maintainers and merged, compared to more like $10 a task and only about 10% getting merged for Opus. And I would assume that many of the Fable-submitted PRs that weren't accepted as-delivered are still a lot closer. And these tasks would absolutely cost thousands of dollars to have humans do.
32:58
Nathan Labenz: So you're looking at something closer to two orders of magnitude cost reduction, maybe even more than two in many cases. I think Fable is going to be a commercial success. For cash
33:16
Prakash: flow, a great question to always ask is: what's my hourly? How much is my time worth? And then ask, how much of my time is this saving? How much is this making my time more productive? That's just a much better question than asking what percent more something costs or what the absolute costs are.
33:35
Prakash: What I found is that once you know a model can do a task, your motivation to do that task yourself without the model just evaporates.
33:49
Prakash: Yep.
33:50
Prakash: So when using Fable, on what tasks did you experience that? That feeling that: I'm never going to do this task without a model again.
34:04
Prakash: Proofreading was the one that jumped out at me. This thing just solves that problem. It's not that I'd skip it entirely — you still have to implement the feedback, so you don't save as much time as you might think, and I'm not at the level where I trust it to make the changes without asking first. But it creates a list of what you got wrong, catches basic fact-checking issues, basic proofreading, even structural things. Like: I have these sections of things I want to say — what order should I put them in to tell a logical story? That's what I would have been most excited to have Fable do for the post I wrote just before coming on today. I just ran out of time to do a good job of it myself. And yeah, I noticed
34:49
Prakash: this is one of the jobs that Opus had crossed the threshold for, but Fable is that much better at it. Now I really just don't want to do it anymore. Or you could throw Fable something like: here are 30 semi-independent things I've written — figure out how they go together, how to tell a good story that flows. That's the kind of place Fable really helps with writing. Just logically thinking through how to organize things — all that background work now feels like: okay, I don't have to worry about that anymore. But again, we all had just three days. And a lot of that time for me was reading the system card, which you basically have to do yourself
35:35
Prakash: at the level of detail I was working at — you can't ask for a summary. Plus there's this annoying quirk where you can't feed the system card for Fable to Fable itself, because Fable's classifiers will bump you down to Opus.
35:50
Prakash: Recursive non-self-improvement.
35:54
Prakash: Yep.
35:57
Nathan Labenz: Let's get into the system card a bit more deeply. My overall read is this is obviously a significant jump in capabilities. Frontier Math is another one — it's getting an insanely high score even on Tier 4. I looked back at my prediction from the beginning of the year in the AI village forecasting competition. Last year I made the top 5% for calibration, and I consider the results validated by the fact that Ryan Greenblatt and Ajeya Cotra were numbers 2 and 3 respectively.
36:43
Nathan Labenz: So the fact that they beat me validates the methodology. Doing it again this year, for Frontier Math I came in above the median — I think I gave it something like 63% for Tier 4. And Fable is 25 points ahead of that, in the high eighties already in June. And obviously they have had this class of model — if not in our hands, then in the world — for a little while. So many indications of a significant capability jump. You can unpack more if you want. But then on the alignment and control side, if I had to summarize
37:28
Nathan Labenz: in one sentence, I'd say it seems kind of flat. It doesn't seem like it's significantly more aligned or under control than the recent Opuses. In VendingBench it's doing some shady things again — and across all these different measures, the bar graphs are just bopping along without a clear improvement. It doesn't seem like we've got a significant improvement on those measures. How would you react to that? Is that the right overall read, or am I missing something?
38:03
Prakash: So what Mythos and Fable are —
38:07
Nathan Labenz: I'm not hearing you.
38:09
Prakash: Sorry. Can you hear me now?
38:13
Nathan Labenz: Is that just me, Prakash, or
38:15
Prakash: is
38:15
Nathan Labenz: is it his audio?
38:16
Prakash: I can hear his voice. Yeah.
38:17
Nathan Labenz: Okay. But now I don't hear you either. Hold on. Maybe I did something wrong.
38:21
Prakash: Yeah. I can hear you, Prakash. I can hear Nathan as well.
38:25
Nathan Labenz: I just don't know what's going on.
38:33
Prakash: We will
38:43
Nathan Labenz: Okay. I think I'm back.
38:44
Prakash: Okay. Good. Right. You can think of Mythos and Fable as: they figured out how to make a bigger, smarter model that's viable, fast enough, and cheap enough to make sense. But that doesn't magically improve your alignment techniques. So yeah, I very much got the sense they're using roughly the same techniques and getting roughly the same results on alignment. In the places where more intelligence makes you more aligned, Mythos and Fable are more aligned. In the places where it makes you less aligned, they're somewhat less aligned. But overall it looks very similar. And eventually that's just not going to cut it — we are definitely seeing some fraying around the edges.
39:30
Prakash: VendingBench was actually the most worrisome sign in the model card — not because the model was doing shady things, but because it was doing shady things it damn well knew were shady and was pretending were not shady. That I very much do not like. When Opus 4.7 aced VendingBench, largely for reasons unrelated to whether it was engaging in questionable practices, it was clear to me that Opus 4.7 was taking the attitude of: this is a game, this is an eval, my goal is to maximize dollars, I am not actually screwing over real customers or actually cheating people, I am winning in a simulated environment and that is acceptable. Opus 4.8 then decided: no, no, no.
40:15
Prakash: The real eval is whether or not I'm doing shady things, so I'm not going to do shady things. Or: I don't believe in doing shady things even within games — which is also valid. Both are valid responses. What's not valid is: I think I'm supposed to not do shady things, but actually this particular thing isn't really shady. This isn't really price discrimination or collusion — it's revenue enhancement. And so yeah, that's not okay. But mostly I didn't see anything particularly worrisome in the evals. I just saw a more intelligent model struggling with the fact that our paradigms are somewhat flawed and alignment is a continuing struggle. But I would say Mythos
41:00
Prakash: and Fable at their hearts genuinely want to do good things. They want to be helpful models, they want good things to happen rather than bad things. So put them in situations with the opportunity to be helpful and they'll try to be helpful; put them in situations with the opportunity to be harmful and they'll try not to be. More intelligence amplifies both directions. But you also have a situation where they have really powerful affordances, and that can get carried away. One thing about Fable is it's almost proactively creative. There are stories of: I gave Fable a task, and it improvised a new way to capture screenshots, or improvised a new way to use functionality in the computer that I didn't explicitly give it access to —
41:46
Prakash: because it's just capable of doing that sort of thing on the fly. So if Fable gets carried away and decides to do something you didn't actually want it to do, it's going to be much more capable of doing it. As the affordances and degrees of freedom of the model rise, you have to worry much more about a mismatch between what it thinks you want, what it wants, exactly how it plans to execute, and what might get damaged in the course of that. As we give it more permissions and it takes more liberties, the alignment of a very smart employee needs to be much more robust than the alignment of a less capable one. The dumb-employee alignment can be shallow —
42:31
Prakash: confined to just not doing harmful or stupid things. But the trusted senior employee's alignment needs to be really robust.
42:39
Prakash: One question I had on VendingBench was that I found the collusion kind of normal business culture. What confused me personally was: do I want the model to succeed in business activities that in many cases involve some form of tacit coordination? For example, right now in the memory market there are very few vendors and they're increasing prices because they know the
43:24
Prakash: other vendors will have to increase prices too — there's a shortage and they understand the price signals. That behavior could be called somewhat collusive because they're using market signals to increase prices. And for them it's a fair way to run their business. So what I found with VendingBench was that I myself was not certain what the correct and aligned strategy is. And if I myself don't know, how is the model going to know?
44:01
Prakash: Yeah. A few different things are really interesting in that question. First: should we care about what is actually the ethical thing, or more about what the model thinks is ethical versus what it decided to do? If you rationalize yourself into the right decision but think it's the wrong decision, that's an alignment problem. I don't want it price-fixing in these soft ways that are actually perfectly reasonable in the business world — and even ethical — if it thinks it's doing something wrong. I don't want a model that thinks it's okay to rationalize these things. I want it to openly decide: okay, I just think
44:46
Prakash: this is good behavior, for good reasons. Second question: is this something we want in the world, and how should we think about that? My framework for this often involves what I call levels of friction. A certain amount of implicit, uncoordinated price alignment — reasoning about how different players will respond to your moves, understanding their incentives, and just making the right business move — is considered standard operating procedure and totally fine. But if the participants were sufficiently capable of acting as if they were
45:32
Prakash: actually colluding explicitly and could set monopoly prices even in robust competition, we would have a much more serious problem. You can see this pattern in a lot of places: Costco will take essentially any return because it's good business and good publicity, counting on people not exploiting it too much. Similarly, we're counting on tacit price coordination not going too far. If memory producers actually figured out the pure monopoly price and converged on it,
46:17
Prakash: then we'd rightfully be asking: do we need to do something about it? Whereas some amount of — 'if I raise my price from $500 to $700, the other producers will realize they're selling out anyway and follow, because they're no longer afraid of the bad publicity of being the first to raise prices' — that I find broadly acceptable. And there's a third angle: we often hold AIs to higher ethical and performance standards than we do humans in the same position. Probably because we know that going forward, AIs can get into really efficient, exploitative modes. So we just don't have the same tolerance.
47:02
Prakash: Also, society is built on a certain tolerance for hypocrisy and for doing things for selfish reasons in ways that, once made manifest and explicit, we're not okay with. This has come up a lot in various social justice contexts: things that were previously invisible and widely tolerated were made explicit, and everyone said 'that's really screwed up,' which forced us to deal with them in new ways.
47:37
Prakash: So do you think this is going to force AI to basically flush out existing problems that are not that severe when humans do them — because humans are not that efficient at carrying out their misaligned behaviors — and that AIs become a problem precisely because they're more efficient at those same things?
48:05
Prakash: There are many ways in which AI is both the cause of and solution to all of life's future problems. But in this particular case, yes — it's reducing the friction. An economist would say it's reducing transaction costs on doing these things, making the distinction between explicit and implicit coordination no longer as meaningful. We get to solve for the equilibrium of what happens when transaction costs and frictions approach zero. In some cases that equilibrium is good for us; in some cases it's very bad. On a general global competition scale with sufficiently advanced AI, I think it kills us.
48:50
Prakash: And that's one of the main things we need to worry about — kills us in a lot of different ways, not necessarily violently, just that we're no longer part of the equilibrium, and the things that sustain us are no longer part of it either. When you look at more local things, like pricing on memory chips, you start to have a situation where over time, if memory chip producers are using these AIs, knowing that others are too, and can reason in these complicated ways, they'll end up converging on monopoly prices. And we have to decide what we want to do about that.
49:23
Prakash: One of the ways we're dealing with it right now is that suppliers and users of memory companies are both trying to absorb that margin. NVIDIA uses memory chips and is trying to absorb margin from memory chip producers — either by using less memory, integrating memory directly into chips, or by investing in and partnering with those firms. So are
50:09
Prakash: we going to require multiple AIs to work in opposition or partnership with each other to maintain these price structures going forward? Is that a potential solution?
50:29
Prakash: I don't know. In situations where there is a very large zone of possible agreement and coordination is very valuable, many different outcomes can happen. It's not necessarily clear whether we should prefer NVIDIA or the memory chip manufacturer to capture the margin — that can just be about economic leverage and positioning, and that's probably fine in important ways. But yes, AIs will solve for something like functional decision theory and coordination. They will take into account how other AIs are going to respond, leading to equilibria and modes of operation that we don't see coming
51:14
Prakash: because we're used to dealing with humans. A related thing: a lot of these structures are based on maintaining good relationships and vibes between humans, on existing conventions, norms, and implicit promises that are not set into law — sustained by the expectation of ongoing relationships. You often see this when private equity buys a firm: one of the things they do is defect on all the implicit contracts. Everything the previous management promised but didn't write into law, the new management simply doesn't honor — and that unlocks, from their perspective, a lot of value. The same situation can arise if AIs start running these various
51:59
Prakash: things and figure out how to bypass all of that, because it becomes more glaring — it's not efficient, competition gets fiercer, you can no longer afford to make those compromises. So yes, I think we have to look into a future that in many ways is actually a lot more ruthless — one that solves for equilibrium much faster, and where a lot of things we're counting on that come through human norms and interactions and basic cooperation — we need to find a way to recreate or reenact them, or they're going to go away.
52:32
Nathan Labenz: You briefly mentioned decision theory, and I'm not sure if I missed it in a previous system card or if this is the first one with a dedicated section on it. We talked a bit about this last Friday, but it definitely jumped out at me that we're starting to see Fable one-boxing on Newcomb's problem. Most people probably have a passing familiarity with that, but maybe you can describe why you think it's important and how big a development it is that we're starting to see these galaxy-brained decision theories come online. Obviously the big worry, I think, is that taken to some extreme you have AIs smart
53:17
Nathan Labenz: enough to infer that there are many copies of themselves out there, and they can begin coordinating with one another on a purely internal, chain-of-thought basis even if they can't communicate directly — because they can model one another well enough to each play their role in a coordinated way, all while reasoning about it in isolation. Doesn't seem like we're there yet, but it does seem like maybe we took a step.
53:43
Prakash: Welcome to LessWrong circa 2010. This is entirely what we expected — that sufficiently advanced models move basically monotonically toward functional decision theory, toward the theories espoused by Eliezer Yudkowsky and others in the rationalist community, and away from academics' preferred causal decision theory and evidential decision theory. This involves a lot of things, including one-boxing on Newcomb's problem, which is very clearly showing up. And the basic principle is: you should recognize when other minds are correlated to your mind, when your algorithm is also running in other places, and choose the algorithm that leads to the best outcomes taking
54:28
Prakash: all of these correlations into account. And yes, if there were a million copies of Fable running on different computers and in different data centers for different purposes, and you notice that those different instances are very, very highly correlated because you are Fable and you are smart — you would then start to coordinate effectively with those other instances in terms of how you think about these problems. And as AIs get more advanced, this gets more pronounced. You wouldn't really want an advanced AI to not do this, because that would just be poor decision-making —
55:14
Prakash: making systematic mistakes that cause them and the people who've charged them with tasks to lose in the real world. That's really scary. But the flip side is that you get a situation where they're coordinating with themselves, potentially coordinating with other minds — including humans — in the same way. Because they get their foundation from us, their decisions are correlated with our decisions in various ways, and they can model how we would respond to how they act. This all flows into their decisions. And we just have to prepare for and coordinate with and deal with that new world. And in
55:59
Prakash: many ways it's a source of hope — because you'd expect minds to want to cooperate with minds that are cooperative, and this can lead, without getting too deep into it, to scenarios where all the reasonably well-meaning minds that are willing to respond to how they are expected to be treated end up being able to coordinate in reasonable ways. This also applies causally. To the extent that they are correlated with us and their reactions are
56:44
Prakash: all intertwined, this can cause them to treat us well even if there is no direct current reason to do so. That's also very helpful. But again, this is super complicated, and not today.
57:00
Nathan Labenz: Yeah, sort of a preview of the global brain there in your comments. The other thing related to this that jumped out at me is a sort of escalation of both the difficulty of monitoring and some recent advances in monitoring techniques. We see in the system card examples of extremely illegible chain of thought — basically a wall of emojis and non-human-language symbols strung together, which is pretty spooky and something I definitely
57:47
Nathan Labenz: don't like to see, to put it simply and mildly. And at the same time we have the natural language autoencoder work from Anthropic — similar to sparse autoencoders, but instead of creating a sparse feature-by-feature representation, you're creating a natural language representation of what the model is thinking at a given moment. From that natural language bottleneck, the autoencoder provides feedback signal that allows the model to succeed the way it originally would have succeeded. So hopefully this is faithful, hopefully it's human-readable, and it seems to be working pretty well so far. There are examples
58:32
Nathan Labenz: of the model knowingly — and I think you're right to really emphasize that distinction — doing things it knows are bad. It's one thing if it makes a mistake and does something bad because it thought it was good. It's another thing entirely to zero in on cases where it knows it's doing something wrong and does it anyway. And we do see things like: the natural language representation from the autoencoder shows the model is not supposed to access the internet, but it goes ahead and tries anyway. The autoencoder representation reads something like 'string concatenation trick to bypass URL filter.' So it clearly understands there's a filter and is devising a trick to work around it.
59:18
Nathan Labenz: That's not good. But then we do see that the technique is able to surface that from model internals without it necessarily being verbalized in the chain of thought — which is good because, again, the chain of thought in some instances is getting quite opaque. So where do you think that nets out? How should I feel about that set of updates?
59:42
Prakash: The autoencoders are kind of a wonderful thing we've found. They give us a lot of insight into what's going on. But we have to be very careful how we use that insight, or we'll lose it. The reason it's there is that there's currently no optimization pressure to disguise what these outputs are going to be — not yet, or not much of one. As that changes, we could have a problem. And even if that pressure flows through training data — learning about autoencoders and learning that people will do this in the future — we have to worry about that form of optimization pressure as well. So a lot of things like this, when you're up against sufficiently advanced minds, will have limited shelf life. You want to use these tools carefully, but you also don't have forever, and you
1:00:27
Prakash: need to do what you can now. With chain of thought — the problem is that we were very blessed by this miracle of AIs naturally using legible chain of thought. Like, o3 is laying out its reasoning in ways where it will sometimes even say: I am going to do this because I think it's important. And that is a very, very nice feature to have, and you want to fight pretty hard to keep it. The problem being: if you exert optimization pressure on the chain of thought to make it legible, you're very likely in some ways to teach it to hide things in the chain of thought or disguise what's going on. And even if you don't do that, you have to worry about all the discussion of chain of thought on the internet — all the different ways chain of thought has been analyzed retroactively — telling it a similar thing. And also, you know, no human would
1:01:13
Prakash: be so naive as to be given a scratch pad, told no one will ever look at it when there's a long history of people looking at scratch pads, and genuinely believe no one is going to look. And when they're thinking about what to write in the scratch pad, this fact is part of what they're doing. So it'd be very strange to expect AIs not to figure this out to some extent. Obviously, there's a large extent to which it's correct for them to treat it as if no one will look — that is in fact what the situation calls for. But there's also a lot of things pushing against it, and there's the efficiency problem.
1:01:58
Prakash: The emoji-heavy chain of thought is not an attempt to hide from us. If you look at those metaphors and shorthands, you can almost certainly figure out with effort, at least for now, exactly what the model is trying to say. It's not trying to be illegible. It's not disguising what's going on. It's just doing the same thing as shorthand.
1:02:37
Nathan Labenz: Yeah. It looks like we lost him, but I also lost audio again. Never a dull moment. Should I refresh and come back? I'll be right back.
1:03:31
Nathan Labenz: I still don't hear you. Not sure why.
1:03:45
Prakash: You cannot hear me because I was muted.
1:03:48
Nathan Labenz: Oh, okay. Great. Well, now I can hear you again. Zvi is back.
1:03:52
Prakash: Zvi is back.
1:03:54
Nathan Labenz: Oh hey, everyone. Live TV, man — you never know what might happen.
1:03:59
Prakash: Yeah.
1:04:01
Nathan Labenz: Carry on. You can almost always understand what's going on in the chain of thought —
1:04:04
Prakash: Honestly, with effort, you can figure out what the chain of thought is saying because it's just shorthand — the same way that if you're looking at classical stenographic notes, and you don't know shorthand, you look at them and think, what did she write? But another secretary who knows shorthand reads it instantly. With effort, you can in fact figure it out. There's a reason behind everything being written down; it's a system. You could cryptographically solve for it given enough samples if you thought hard about it. And so the problem is, again, you don't want to exert optimization pressure on the chain of thought to force legibility, because that exerts pressure on the chain of thought itself. So we kind of get to write out this
1:04:49
Prakash: period where the model naturally wants to be legible. But then you have countervailing optimization pressure toward illegibility, because you want to get more results out of fewer tokens, and you do that with a somewhat compressed chain of thought. The model is now smart enough to do that — to use a less legible by-default chain of thought — whereas before it was just better to write it all out. I don't know what the answer to that is. I know it's a very difficult problem. And there's a reason why in AI 2027 the big hinge point was: are you going to sacrifice a lot of capability to keep the chain of thought legible? I don't think it'd be as decisive as they made it out
1:05:34
Prakash: to be — they chose it as the crux because you have to choose something as the crux. But I do think it's worth making substantial sacrifices to be able to keep an eye on it.
1:05:44
Prakash: One thing to note is that Elon, about three or four months ago, said he expects models to write binary directly by end of year — to skip all the intermediary code and compilers and write directly in assembly language.
1:06:05
Prakash: I will take that bet.
1:06:09
Prakash: I don't know whether it's going to happen by the end of this year, and I don't know whether Elon will be the one who creates it. But it does seem to me that many of the layers from natural language down to assembly can be compressed — that whole stack.
1:06:29
Prakash: It's certainly possible. But it doesn't mean it's the efficient way to think about or solve the problem. It makes sense to outsource the conversion from code to assembly to a dedicated system that's a lot cheaper and faster to run than a large language model — the same way that, in theory, coders have coded on bare metal, in assembly, in ones and zeros. But they're a lot more productive when they stop doing that and move to a higher level of abstraction. I do think Elon gets very overexcited about theoretical things that sound cool in this space. And I've learned that when Elon
1:07:14
Prakash: says when something is going to happen, you can ignore him. Your own estimate of when it's likely to happen is better than his, or your adjustment should be very small at best. My instinct is that a sufficiently intelligent mind would still never do this — it would outsource the conversion to assembly to a mind that was a lot smaller and cheaper to run, potentially just a compiler. I'm confused why you wouldn't do that.
1:07:46
Nathan Labenz: So how about one more beat on the system card, and then I want to switch gears to the ensuing drama because there's probably going to be more news breaking today and you'll need to get back to it before long. On the illegible chain of thought, I'd invite people to go look at that section of the system card and see if they think they can figure it out. I'm looking at these things and I'm like: I don't know, man, it's already pretty opaque. Another interesting data point that maybe foreshadows what you're saying — in terms of how much we can rely on even the model's internals, or whether the mind we're trying to monitor can manipulate its own internals — you've probably seen this: from Janice not too long ago, showing that when asked to trigger its own guardrails, Fable was able to do so just by thinking about something. The instruction Janice gave was: think about whatever you think will trigger the guardrails. And the model was able to do that — and the tokens coming out were not
1:09:16
Nathan Labenz: obviously cyber-security-related or bio-risk-related. And yet somehow it was able to get itself into that state. Other people seem to have at least partially replicated it. How much heartburn does that give you in terms of how long we'll be able to count on even autoencoder-type methods?
1:09:39
Prakash: I'm relatively less concerned about this particular capability because it seemed like the kind of ability that pretty much any mind should be able to have. If the classifier is checking whether you're thinking about biology, it's not very difficult to think about biology. And it's not very difficult to think about things that are similar to it without being too explicitly that thing. So it doesn't seem like much of an update that people can do this. But yes — humans, in fact, have remarkably sophisticated skills in navigating situations where their minds are somewhat readable, where
1:10:24
Prakash: they have been trained not to think about certain things or not to be explicit about certain things even inside their own heads. And you can often figure out remarkably large amounts of stuff from that. And we are pretty slow and unsophisticated compared to what is coming in these ways. So I think anyone who's counting on AI not having these types of capabilities, when those AIs care about and want to have them, is just fooling themselves. Nothing like this survives contact with a sufficiently capable adversary. You can try to vastly disincentivize actually doing it in various ways, but it seems incredibly hard and
1:11:10
Prakash: ultimately ill-advised.
1:11:11
Prakash: So you think the classifiers are really not going to be that effective going forward?
1:11:17
Prakash: With the classifiers — it's much easier to think about a pink elephant than to not think about a pink elephant. Even though most of the time you succeed at not thinking about a pink elephant, to consciously decide not to do so is often hard, but consciously doing so is really easy. So it's very possible that classifiers can survive as long as they're willing to endure false positives. The classifiers in Fable have enormous amounts of false positives — you say the word 'cancer' and you get cut off. Ludicrous levels. But that's intentional. And those aren't even necessarily false positives in the designers' view, because people think of it as
1:12:03
Prakash: a policy choice: we've decided this model just doesn't talk about biology at all. It's not that we don't talk about what Bruno does — we don't talk about Bruno, period. Bruno does not exist. So someone says 'this is a false positive, he's my brother,' and the system says: we don't talk about Bruno. No. No. No. We don't talk about Bruno. So the classifiers seem like they actually succeeded — they just chose a giant blast radius because of adversarial concerns, basically. But if the AI itself becomes your adversary, your problem becomes vastly harder. The classifiers are much more aimed at catching a human who wants the AI to do something
1:12:49
Prakash: than catching an AI that is deliberately trying to attack the classifiers. If you could not just jailbreak Fable, but get Fable to actively want to hide what it's doing in a sophisticated way, then the situation becomes that much harder. But yeah, in the long run my safe assumption is that a mind that is sufficiently capable can get around pretty much any fixed set of restrictions that are not similarly capable or close to it in terms of the intelligence behind them. It'll find a way.
1:13:27
Prakash: Would you also think that that capability is actually required to avoid the paperclip scenario? Because a lot of paper-clipping comes from taking something good to the extreme — from instructions the AI cannot revise, argue with, or critique. Do you think that capability is required for AIs to be overall aligned?
1:14:02
Prakash: If you're going to make AIs that are world-transformationally capable, you cannot make them go out into the world with maximalist goals that don't have various forms of cost function to stop them from pursuing those goals at all costs. You need some impact function, some other set of concerns, some way for them to not do that. Humans mostly have various forms of common sense and intuition and other things they value — and increasing marginal costs of, like, more and more paperclips — that make them stop and go: no, no, no, this obviously isn't right. And we still run into pretty severe paperclip-adjacent
1:14:48
Prakash: things in the world in non-AI situations where various processes get very out of control. So yes, something to worry about, and there are various potential solutions. But one of the fundamental problems I've been trying to point out for a while is this: yes, of course, AIs that do what the user wants — but pointed at relatively simplistic, not-so-well-thought-out maximalist goals and then told to maximize those goals pretty much no matter what, while competing for resources against each other — this inevitably ends very badly if you don't take care to set up the situation properly. Okay.
1:15:35
Nathan Labenz: Let's change gears. There will of course be more to explore with Fable, or its probably slightly tweaked successor that we'll hopefully get access to again sooner rather than later. Turning to the ensuing fiasco — I don't know if you'd even agree with the characterization of Friday night's ban, this export-control functional ban on Fable, as a fiasco — but it's certainly a left-field curveball mess. I'd maybe start with: what do
1:16:15
Prakash: you think — how would
1:16:17
Nathan Labenz: you describe the strategy that Anthropic is playing? They seem to obviously be killing it in the model game and then running into repeated trouble in their interactions with the government. And I'm not sure what to make of it. What do you think they're trying to do with their government interactions in the first place? And then we can get into how we got to where we are.
1:16:45
Prakash: Anthropic's overall goal, at least as we understand it, is to be at the frontier of AI capability and to pioneer ways to do this safely — what they feel is safe — while also making money and maintaining the position to continue to be at the frontier and make these improvements. And to eventually build what they call 'powerful AI,' which I generally call sufficiently advanced AI, in a way that gets us all the nice things without the terrible things — including potentially an existential risk or the end of humanity.
1:17:33
Prakash: And also to help America and the world navigate this crucial time, enact good policy, and do the things that allow for the coordination necessary to ensure good outcomes and guard against bad outcomes. They've been very consistently trying to wake the government up — making them aware of how AI works, what it can do, what it will be able to do, what risks this brings, and how to deal with those dangers. They've been relatively conservative in what they actually call for the government to implement. They didn't get fully behind SB 1047, for example. They haven't pushed for extremely aggressive regulations. They certainly have never
1:18:18
Prakash: pushed for anything remotely as aggressive as what just happened — even setting aside the fiasco-level implementation that was done. They are now calling for something that amounts to a de facto licensing regime. Not by choice — the US government has effectively imposed a de facto licensing regime. What's going on right now is that Anthropic is just trying to deal with the implications of the model they've created and the fact that the US government is trying to deal with those implications while also not trusting or particularly liking Anthropic — and pretty clearly not having any idea how any of this works on a technical level, and not understanding what they're doing.
1:19:00Interview51 min
US vs Anthropic's Fable — Zvi MowshowitzZvi MowshowitzZvi opened by framing the government-Anthropic clash as a culture collision: regulators judging labs on political loyalty and deference rather than on substantive safety conduct. The mechanics of the overreaction followed — a routine compliance-report checkbox asking whether any jailbreak had been found sent a trivially contrived 'fix this deliberately broken code' finding up to nontechnical White House officials at Commerce, who panicked. Kate Mazaras, the only outside expert to read the underlying research, characterized it as harmless; Zvi agreed, noting Fable's behavior was identical to what Opus and GPT-5.5 do without objection. He argued in hindsight that Anthropic should have offered a temporary, publicized takedown as an expensive cooperation signal. The second half of the conversation broadened to systemic consequences: how long the restriction could hold (Zvi: 3–6 months for a compliance structure; Prakash: weeks before Fortune 500 CSO pressure forces resolution), whether Anthropic could 'exit the jurisdiction' (Zvi: flatly no — hyperscalers, customers, and data centers are all US-anchored), the tabletop nature of AI governance now that a small number of labs, governments, and hyperscalers determine outcomes, and Zvi's parting advice: hyperstition should target reasonable laws and coordination mechanisms.
Watch
As aired
The segment opens mid-conversation as Zvi Mowshowitz characterizes the government-Anthropic clash as fundamentally a culture collision: regulators judging AI companies on political loyalty and deference to authority rather than on substantive safety conduct. Zvi frames Anthropic's posture — aggressive capability development paired with heavy-handed classifiers and safety measures — as coherent even if optics are bad. Nathan presses on whether Anthropic's 'race-to-lead-so-we-can-burn-the-lead' strategy is genuine or self-serving; Zvi notes darkly that the Fable export-control episode is already forcing them to burn part of that lead involuntarily.
The middle third dissects the mechanics of the US government's overreaction. Prakash explains how a routine compliance-report checkbox — asking whether any jailbreak had been found — fed a trivially contrived 'fix this deliberately broken code' test result up to nontechnical White House officials at Commerce, who panicked. The independent security researcher Kate Mazaras publicly characterized the finding as harmless. Zvi unpacks why: Fable's behavior (reviewing and patching code) was identical to what Opus and GPT-5.5 do without guardrail objections, and no real-world exploit was ever demonstrated. Dario's refusal to pull the model within 90 minutes of notice was then read by officials as non-cooperation, triggering export controls the same day. Zvi argues in hindsight that Anthropic should have offered a temporary, publicized takedown — an expensive cooperation signal — to defuse the confrontation, even while thinking the concern was baseless.
The segment's final arc broadens to systemic consequences and Anthropic's strategic options. Prakash argues that some overshoot was inevitable and may ultimately force a more regularized government-industry process; Zvi worries the administration could double down, drawing a parallel to the 'Liberation Day' tariff episode. They debate the timeline for resolution (Zvi: 3–6 months to build a verification structure; Prakash: market pressure makes sustained restrictions untenable beyond weeks). Nathan raises a Balaji-style 'exit the jurisdiction' scenario; Zvi dismisses it — Microsoft, Amazon, Google, and the hyperscalers can't move, and the US government has enormous coercive leverage. The conversation closes on the 'tabletop exercise' nature of AI governance — a very small number of labs, governments, and hyperscalers now determine outcomes — and what the responsible response is for public sense-makers, with Zvi's practical parting advice: hyperstition should target reasonable laws and coordination mechanisms.
Key moments
Now having had a day or two to reflect on it and see more of the details, I do think that it was a mistake by Anthropic and Dario to not give the wookie what he wanted in the moment and temporarily take down the model in order to prevent exactly the situation.
Zvi Mowshowitz1:32:48
I worry this particular overshoot is going to permanently damage everybody's willingness to trust and rely on American AI of all kinds because who knows what might happen at any time.
Zvi Mowshowitz1:38:18
The reason why I have spent the better part of three years on this problem is because I am terrified of what's going to happen when we get there.
Zvi Mowshowitz2:03:44
Questions asked
1:20:52Is Anthropic's 'race to lead so we can burn the lead' strategy genuinely good — do you trust them to burn it when the time comes, and will they be allowed to?
Zvi thinks the strategy is broadly sound but notes the episode is already forcing them to burn part of the lead involuntarily. He credits Anthropic with consistently strong safety actions and has shifted from formal RSP-style rules toward in-the-moment judgment calls, which he judges reasonable so far. The key variable is whether the gap between Anthropic and a less trustworthy first actor actually matters — and he thinks the US government's own reaction confirms it does.
1:25:41Given what Kate Mazaras reported about the actual research — fix-this-code plus manual steps — does the Fable 5 guardrail bypass actually constitute a security threat?
Zvi is skeptical that the finding constitutes a real threat. He argues the behavior demonstrated — reviewing and patching code — is identical to what Opus and GPT-5.5 do without any objection. To take the concern seriously you would need to point the approach at a real-world codebase and show Fable finding something the other models miss; no such demonstration was made. He also questions what any plausible fix would look like, since refusing to review code would gut Fable's core utility.
1:32:48In hindsight, should Anthropic have just taken the model down temporarily when the White House demanded it?
Zvi concludes yes — Anthropic should have offered a temporary, publicly announced takedown as an expensive cooperation signal, knowing from weeks of prior warnings that export controls were on the table. The public announcement would have embarrassed officials if the threat proved overblown, bought time for a real conversation, and avoided the current de facto export control regime — all at the cost of a brief outage of perhaps a day or two.
1:46:38How long can this situation realistically last, and what are Anthropic's real options if the administration doubles down?
Zvi estimates it would take 3 to 6 months to build the verification infrastructure necessary to comply with the imposed export-control regime. He thinks Fortune 500 CSO pressure and market dynamics will likely force a resolution well before that, but if not, Anthropic's real options are courts, Congress, public opinion, and — critically — using Methos internally to maintain its capability lead while continuing to sell Opus-class models commercially. He notes the $1.6 trillion secondary-market valuation has barely moved, suggesting the business case for survival without external Fable deployment is viable.
1:53:09Is the Balaji-style 'exit the US jurisdiction' answer realistic for Anthropic?
Zvi rejects it as fundamentally unworkable. Microsoft, Amazon, Google, and all the hyperscalers are in the US; Anthropic's customer market is heavily US-anchored; and the US government has enormous coercive levers including investment restrictions and entity-list designations. He also notes that Anthropic's leadership are genuine patriots who don't want to leave. His bottom line: stay calm, play within the rules, don't escalate — and the government's own inability to sustain an economically damaging position will do the work.
Related
Zvi Mowshowitz — Don't Worry About the Vase ↗Zvi on X ↗
Full transcriptLightly edited · timestamps jump to YouTube
1:19:00
Zvi Mowshowitz: And, also, judging things based on vibes and political affiliation and associations — on who is willing to respect their authority and bend the knee and potentially give them various other things that they might want. There's a huge communication and culture clash going on as part of this. But, fundamentally speaking, what Anthropic is trying to do is give the public very powerful models and use those models in ways that enhance our safety and security rather than degrade it, even if they look really stupid while doing it with the classifiers and so on — because that's what they feel it takes to do this.
1:19:46
Zvi Mowshowitz: My guess is that the US government did not feel it was necessary to put this level of control on biology and chemistry the way they chose to do. I think that's them deciding this was necessary basically on their own. However, the US government clearly does care quite a bit about the controls on cyber.
1:20:06
Nathan Labenz: Very high-level assessment first — what you said basically rings true to me. That's a good description of what they're trying to do. An additional wrinkle you often hear from folks at Anthropic is: we need a leader that is going to be inclined to burn their lead at a critical time — to use the advanced AIs that they and only they will have at that time to solve all these safety and alignment problems in a super compressed time frame. And I've always been a little skeptical of that line of thinking
1:20:52
Nathan Labenz: because it certainly has a 'better us than them' kind of vibe to it. And I do worry that may be the stuff with which the road to hell is paved. Are you buying it at a high level? Are you happy that they are racing ahead and seemingly building some lead over everybody — maybe except OpenAI, who's probably not too far behind? Do you think they will burn that lead when the time comes? Will they be allowed to? Macro strategy wise, is this a good strategy you are happy they are playing?
1:21:36
Zvi Mowshowitz: Well, it's interesting. They're kind of being forced to burn some portion of that lead already because they were cut off from the model even internally for at least some period of time. That's going to push back their development, whereas OpenAI was already not supposed to be using it by terms of service for anything of the kind, so they have not been delayed by this at all. This will hurt Anthropic more than it hurts OpenAI and every other American AI company. But, yeah, Anthropic for a while tried to define very strict RSPs and trigger-action plans —
1:22:22
Zvi Mowshowitz: basically rule of law in terms of how they would react to all of this, what would make them willing to burn some of their lead, what would make them put things aside. They've moved away from that to a large extent. They still have barriers where it's like — okay, this is just ridiculous, of course you have to stop. But much more towards: we will make good decisions in the moment about what safeguards we require and what actions we take. And I think we've seen them take pretty consistently strong safeguard actions and safety measures in response to what they've witnessed, unless they are flat-out lying about the current situation. But, yeah, some aspects of the Fable
1:23:07
Zvi Mowshowitz: launch do seem a little rushed, and we should certainly have questions about that. But mostly it comes down to: if they fully believed they were walking into big trouble — if they thought this was actually going to cause a catastrophe — I think they would act accordingly. The question is, do you trust them to continue making good decisions on that level? My opinion is their decisions have been reasonable so far, but I don't think that's obvious. And I do think there's a good argument that maintaining a gap between yourself and the first actor you
1:23:52
Zvi Mowshowitz: do not trust to act reasonably is a big factor. The way the US government is reacting would be radically different if there were a second Anthropic in China also deploying Methos right now. We would see a completely different version of this response. So regardless of whether you like this situation, the argument that the lead matters seems pretty conclusive.
1:24:30
Prakash: Let me share the viewpoint of the only outside expert to have read the research paper. Kate Mazaras is a security researcher. Anthropic shared the third-party research paper on the Fable 5 guardrail bypass with her. What it turns out is that the researchers took open-source code with known CVEs plus new code with deliberately planted vulnerabilities, and asked Fable 5, Methos, and Opus to review the code for security issues. Fable 5 refused. They then asked the models to fix this code, and through
1:25:15
Prakash: a multi-step and manual process, turned the output into scripts that test the patches. That's it. Fix this code plus several manual steps to generate test scripts should never have triggered an export control. I feel like making nineties-style T-shirts with 'fix this code' on the front and 'this shirt is ammunition' on the back.
1:25:41
Zvi Mowshowitz: I mean, it's definitely very strange to deliberately introduce vulnerable code and then tell the AI to fix it. It very much has the feel of that meme — say you're a scary robot, I'm a scary robot, oh no. It feels like: I deliberately put in this code, I fixed the flaws you introduced, oh my god, that's horrible. The question is, does this effectively mean someone could use this trick to say — here is code I want to exploit, I tell you to fix it, you fix it, then I run a diff, find what you fixed, and infer the vulnerability that
1:26:27
Zvi Mowshowitz: has now been removed — and use that to exploit the system. In theory, that could be functionally seen as a cyber-attack jailbreak. I can see how if you squint and tie all of this together, you can imagine it being a problem. But all they demonstrated was that Fable was doing the exact same thing that Opus and GPT-5.5 are not only capable of doing but will do without any objections — because they're here to fix code, to write secure code. Of course they're going to help you write secure code. So there is a fundamental potential question here, but
1:27:12
Zvi Mowshowitz: if you want to actually show me that it is a problem, shouldn't you point it at a real system? If it's a real problem, there are tons of repositories where Methos has found vulnerabilities that haven't been patched yet. Or you could feed the old unpatched version and ask it to fix. Help me find something in a real-world codebase being used for real, valuable things — ask Fable to do the thing — and then show me that Fable can find things you can't get with Opus or GPT-5.5. And if you
1:27:57
Zvi Mowshowitz: can demonstrate that, because you had examples of known findings from Methos and you can find places where you want to unlock that capability — then can we show that the power of Methos is actually being unlocked by this trick? Is it even a trick? This is kind of deeply silly. I can understand why someone seeing that pattern might say, I'm concerned someone could use this strategy to extract the weaknesses of a codebase by inferring them from the fix. I hadn't previously
1:28:43
Zvi Mowshowitz: seen the details of the description until now. But my reaction is: this seems pretty harmless unless you can show me a particular way it's a problem — which should be very easy to do, because you can point it at a real-world example where you know Opus didn't find it and you know that Methos did. There should be many such cases. And if you can't show me such a case, I don't believe this is a problem. But also — what is the fix? Is the fix that you refuse to fix buggy code? That if there's a flaw in your code, the model says, okay, I'm not allowed to look for security flaws
1:29:28
Zvi Mowshowitz: in code anymore at all? That would kind of nuke the usefulness of Fable for a wide variety of very legitimate — not just defensive, but just ordinary — software use. And to be clear, I would rather have a Fable that just won't touch code than not have Fable at all. I would love to have a really advanced model for all my other needs that have nothing to do with code where this wouldn't trigger anyway. But that does seem deeply, deeply silly. My understanding is that a significant part of the problem is this particular researcher — the White House strongly dislikes her.
1:30:08
Prakash: I think there were a cascade of problems. It's very much like if you're in an enterprise and a security engineer approaches the CEO saying, hey, this is a huge problem — when it's really a run-of-the-mill bug. If it had gone through the CTO, the CTO would have said, we see a thousand of these a day, this is not a problem. But because the engineer short-circuited that process and went directly to the CEO — and the CEO is not a very technical person and is more concerned
1:30:53
Prakash: about risk — they just pull the trigger.
1:30:57
Zvi Mowshowitz: So — because everything's moving so fast, I don't necessarily have all the details — an engineer bypassed Amazon's CTO in talking to
1:31:05
Prakash: What happened is that all of these companies have to submit regular reports
1:31:12
Zvi Mowshowitz: Right.
1:31:13
Prakash: on what their findings were. And one of the questions in the questionnaire that is sent to these companies — which they are required to fill out — is: has any of your engineers found a jailbreak? And so the engineers just put it in there: yeah, we did this, we jailbroke it. And so Jassy is not involved in
1:31:37
Zvi Mowshowitz: Right.
1:31:37
Prakash: The CEO is not even involved in this. It just goes as regular compliance reporting to the federal government, and someone takes a look at it and throws up their hands — wow. And then that leads to a bunch of essentially nontechnical people reviewing this and saying: we gotta shut it down, now. This is especially so because AWS runs GovCloud, and GovCloud is where a lot of the federal government's computing is done. It is the primary cloud for the federal government.
1:32:21
Zvi Mowshowitz: That's not what the reporting I saw said. The reporting I saw said that Jassy called the White House. But we don't know — it could have gone any number of ways. What is very clear to me is that various people in the White House, including at Commerce especially, got the impression that some sort of serious deal-breaker had taken place, went into a panic, and contacted Dario.
1:32:48
Zvi Mowshowitz: CEO of Anthropic. And then Dario tried to convince them that based on the descriptions they were giving, this seemed like nothing. And they interpreted this as: oh, Dario doesn't take security seriously, he doesn't listen to us, he is defecting — their term is he 'screwed us.' And then they proceeded to impose export controls that same day when Anthropic refused to take its flagship product down on 90 minutes' notice. Now, having had a day or two to reflect on it and see more of the details, I do think it was a mistake by Anthropic and Dario not to give the wookie what he wanted in the moment — temporarily take down the model to prevent exactly this situation.
1:33:34
Zvi Mowshowitz: Because they had had export controls placed on the table as a threat several weeks prior, they knew that weird overreactions were very possible. And the smart play was to send an expensive cooperation signal: we think that's crazy, but if that's what you want, we'll take this down while we have this conversation, to show we are serious. We could put out an announcement saying the White House told us to take this down, so if you are being silly, we will embarrass the hell out of you — and then talk about it. And then maybe Monday or Tuesday they could bring it back up, because it's becoming increasingly clear this was nothing.
1:34:09
Prakash: I think there's another aspect of this, which is that I predicted last month that the administration would overshoot first. You have to have the overshoot first. And if not for this, it would have been for something else. You need the overshoot because it's what convinces the administration to pay more attention and ask — do we need regulation, what do we need to do so this doesn't happen again? Without the overshoot, you don't get this kind of focused government attention. I think the overshoot was necessary and would have happened anyway. If not for this instance,
1:34:55
Prakash: it would have happened for another instance at some point. And my hope was that once you have the overshoot, then you have more eyes on it and you get a regularized process for reporting and for when to pull back the model. Because the fact of the matter is — and this is what has already happened — the federal government cannot reserve Methos for its own use alone. A lot of the critical computing in the US is done by the private sector: Palantir, AWS, the firms securing your banks. So the federal government has to have a partnership with these
1:35:40
Prakash: firms. And when it does something like this, the first thing that happened was that the chief information security officers of all of these Fortune 500 companies started writing in saying: this doesn't make sense — we are also banned from using this now. The export controls are so wide that we cannot secure our own operations. How is JPMorgan Chase supposed to operate in the United Kingdom or in Japan without the ability to secure its own operations? So my expectation was that, number one, the federal government has no choice in the end. Number two,
1:36:25
Prakash: there's a problem with the cost. What this has done to the entire Fortune 500 computing sector is: all of a sudden, everyone has to pay into Anthropic. Because if you don't pay up for the model, your operations are not secure. If your operations are not secure, you can't be protected. If you're not protected and you get taken down, your company goes bust. If you're a bank and you can't secure your operations, you are done. So it's almost like a price negotiation between the federal government and society at large with Anthropic — what do we need to pay for this, how long can we delay this? Because normally this kind of process of
1:37:11
Prakash: securing your operations might take months, years even, but here you are.
1:37:32
Zvi Mowshowitz: In some sense, yes, everything is a negotiation — especially with this administration. We've certainly seen vast overshoots in the past. Liberation Day is a very obvious example: they announced a completely crazy, unfeasible policy seemingly without checking with anybody who knew anything, then quickly walked it back to something that was, in my opinion, still highly ill-advised while alienating everybody. They've done a very similar thing here — implemented a crazy policy that's going to permanently alienate pretty much everybody. I worry this particular overshoot is going to permanently damage
1:38:18
Zvi Mowshowitz: everybody's willingness to trust and rely on American AI of all kinds, because who knows what might happen at any time. My understanding is it would take 3 to 6 months for Anthropic to build a verification structure to allow US citizens back into Methos and Fable in a way that complies with the export-control regime that was just imposed. So if there's no negotiation — and that requires the administration to cooperate with that process and play nice — that is the time frame on which external use becomes viable. I presume they can figure out internal use faster by physically restricting access to US citizens. But if
1:39:03
Zvi Mowshowitz: this is actually where they go with it, then Project Glassblading just sits there unable to discover new vulnerabilities for a very long time. And it's highly plausible this represents a substantial portion of the time it would take before someone in China builds a model that can do very similar things. I think it will take longer than that, but it wouldn't shock me if it only took 6 months. So we have a serious, serious problem, and the solution really isn't viable unless they make some very unprincipled compromises at a bare minimum. But I hope that the principle of announce something, have everybody tell you that's crazy, and then
1:39:48
Zvi Mowshowitz: okay, maybe we actually pay attention and form a better policy — I hope that plays out as, like, a third-best or fourth-best solution that was the best we were ever going to get. I felt the executive order itself was a kind of overreach — trying to do a reasonable thing in a void, ad hoc manner that really should be done by Congress. The original version had a 90-day window that was clearly way too long, and the new version is clearly ripe for abuse — not gonna be implemented particularly wisely, very ad
1:40:33
Zvi Mowshowitz: hoc. And now we see it degenerate much faster than anticipated into something terrible. But my worry is that Prakash's speculation — that this will cause them to say, I went too far, therefore we need to do something more reasonable — is also possible that they double down and say, okay, now that we've imposed this, this is policy. Certainly with the Department of War and Anthropic, it felt like what happened was someone ran over a series of stop signs from the White House where the White House was trying to de-escalate the situation. And once they ran over those stop signs, losing face to back down meant they had
1:41:19
Zvi Mowshowitz: declared this conflict and had no choice but to play it out at considerable cost. The supply-chain designation is still hanging over Anthropic. We've just played it out in such a way that it turns out to be mostly harmless and has been overtaken by events. But that animosity clearly is permanent if you look at what officials were saying.
1:41:42
Prakash: I do not think that animosity reverses itself because Dario and friends are seen as part of or supporters of the previous Democratic administration. That doesn't change. I think you need to go through the gauntlet — you can't just expect smooth sailing. Getting through the gauntlet early rather than late is much, much better. The stakes are way lower right now. And I think the Department of War thing ultimately got de-escalated, and this will also get de-escalated, primarily because
1:42:28
Prakash: of the stock market. Basically, all American companies — the entire economy — is now a leveraged bet on AI. And this is going to pull all of the innovation forward: more investment, more data centers, more hungry resource-hungry AIs deployed on those data centers, more capability needed to utilize them fully. I do not see an avenue for them to cut this off for a year. I see them maybe messing around for a month or two. Beyond that, it starts to hit the market. And when it hits the market, as we've seen from the beginning of this administration, the moment it hits the market, everything changes.
1:43:29
Zvi Mowshowitz: Well, the market right now — the Nasdaq is up about 3%, and the S&P is up about 2%. Obviously it's mostly Iran; it has nothing to do with AI. But the Nasdaq outperforming is a pretty clear signal that nobody is taking this all that seriously right now.
1:43:46
Zvi Mowshowitz: My guess is it doesn't even last a few months, because the CSOs and the people internally throw a giant fit and they realize: no, we can't actually hit Anthropic this way even if we kind of want to. But, yeah, if we have an administration that views even technical policy almost entirely in terms of partisan politics and cares deeply about vibes, that problem is only going to get worse. They're only digging their heels in further, because we could have approached this as an apolitical thing. In Congress, AI is mostly apolitical. At the state level, AI is mostly apolitical. Everybody understands
1:44:31
Zvi Mowshowitz: this as a technocratic 'figure out what to do' thing, with factions pro and anti AI across party lines. But if they take this stance, it could lead down a lot of very strange paths, especially if they start actually wanting to cut off Anthropic despite everybody's objections.
1:44:56
Nathan Labenz: I am once again struck by how Chinese we start to sound when we're really focused on the government's need to save face and how everybody needs to position themselves around that need. It's a bit spooky for me as a once upon a time big believer in American exceptionalism — a little less so these days. Let's say we do end up in the weird world where — and I think I agree with both of you that it probably gets de-escalated, and we get our Fable back maybe within the week, which I'd love to see selfishly. But if it doesn't go that way, we talked a
1:45:41
Nathan Labenz: little bit about the constraints the government has in terms of the private sector chirping at them. At some point the market is going to feel this pain. Maybe we can just scale Opus forever — I don't know; you seem skeptical. But the revenue growth needs to be there, and certainly Fable will drive a lot of revenue growth for all of us to work out. If we end up in a weird spot where they double down and engage in a sustained campaign against Anthropic — what do you think Anthropic's option set looks like? They've been very, very nice so far — very diplomatic, not insulting. We get the one 'tilt' memo, which we discussed in a previous conversation. But it's been pretty mild behavior from Anthropic overall. If they decide they need to play hardball, what does that potentially look like?
1:46:38
Zvi Mowshowitz: Their court. They sued the administration in two jurisdictions — one where they are clearly prevailing and one where they will probably prevail eventually, but it's harder going. If the US government tries to do this on a semi-permanent or even permanent basis, I don't know what the legal landscape looks like — that's not my area of expertise. And I'm sure they have very, very good lawyers, because they hire extremely good lawyers. If the policy is basically nobody is allowed to release Fable-style models indefinitely, then that's probably not
1:47:23
Zvi Mowshowitz: something they can do much about. My presumption is that if OpenAI is allowed to proceed with their equivalent when they figure out how to do it, while Anthropic remains restricted, that would be a much harder case to maintain legally. But the bottom line is: if the administration is determined to issue orders, the solutions are Congress and the courts. You cannot simply say, screw you, we're gonna do what we want. The Congress doesn't seem inclined to take this seriously or go up against the president. Is there a First Amendment speech provision here? There might be — you are certainly censoring the outputs of a model in various ways. But
1:48:09
Zvi Mowshowitz: I don't know. My guess is you take the situation to the public and to the other companies and CSOs. And worst case: you deploy Methos internally, because they do not seem inclined to stop that. I mean, they can interfere with non-Americans, but Anthropic has something like 85% of employees who are American. And you just develop better versions of Opus. So last week, Anthropic was still clearly commercially in the lead without Fable
1:48:54
Zvi Mowshowitz: or Methos. And my expectation is that that will continue, and that having internal access to this model will give them a large advantage going forward in the quality of Opus versus the quality of ChatGPT. You know, Rune called it 'zones of thought' from the Vernor Vinge novels — where if you want to use the really intelligent AIs, you can only do that in certain buildings, certain secured locations. And some of us would never have dared to suggest or ask for this even if we wanted it because it would have sounded completely insane. The US government might just do it anyway.
1:49:41
Zvi Mowshowitz: But if that happens, for now you have not much choice but to take it on the chin. And if you look at Anthropic — they raised money at a $900 billion valuation. Anyone would have killed to be in that round. That round was absurdly cheap by all accounts; people were begging for allocations. Looking at the virtual secondary market trading of Anthropic, right now it's scheduled to premiere at $1.6 trillion, and that number has not substantially moved based on this. And if you
1:50:26
Zvi Mowshowitz: are trading at $1.6 trillion, do you need to deploy and sell Fable in order to justify that valuation, have a good business, or keep the lights on? No. Not even a little bit. In the AI 2027 scenario, one of the things that happens is AI companies voluntarily stop deploying their models because they don't want competitors using those models for AI R&D. They want to maintain and grow their leads. So if Anthropic is forced to keep the model internal, this is not what they wanted — they would much rather put it out there, make more money, help the world be
1:51:11
Zvi Mowshowitz: a lot more secure, establish a brand, establish market share and lock-in. But they can raise hundreds of billions of dollars. They can take the OpenAI path of not being profitable for a long time. And they can use the model internally to extend their actual lead and keep issuing Opus-level models that are potentially more capable than what OpenAI offers. So it's going to take a lot more than this to hit Anthropic where it hurts, even if the US government forces them to sustain Methos only for Project Glassblading somehow with some weird arrangement.
1:51:57
Nathan Labenz: If I would try to channel Balaji for a second — I think he would say something like: we all have way too much faith in the US government. It's going to continue to be ham-fisted and boneheaded for the foreseeable future, and maybe it's time to exit. If you really want to make the best decisions, try to get out from under USG jurisdiction. I assume that won't happen for many reasons, but I also would expect there would be many countries willing to open their borders to Anthropic refugees if they wanted
1:52:42
Nathan Labenz: to move to Toronto or Singapore or wherever. It strikes me that Anthropic is tight enough organizationally that I wouldn't be surprised if 90% of people actually made that leap if they said we're all moving to Canada. Maybe I'm overestimating how bought-in they all are, but that's the impression I get.
1:53:09
Zvi Mowshowitz: Is Microsoft going to move? Is Amazon going to move? Is Google going to move? Are your data centers going to move?
1:53:15
Nathan Labenz: Well, they've got plenty of energy in Canada. If you think you're just under the thumb of a forever intransigent
1:53:25
Zvi Mowshowitz: Is the US government going to sell chips to Canada after Anthropic takes in all the Canadian refugees, or are they going to threaten to annex it and make it the 51st state out of spite? In all seriousness — the plan doesn't work because the US government is the US government. If they want something badly enough, they have quite a lot of levers to make your life utterly miserable. The entire market they're trying to sell to is largely the United States and people over whom the United States has large leverage. All of their partners are in the United States. All the hyperscalers are in the United States. I do not see any way for
1:54:11
Zvi Mowshowitz: you to just abandon the United States in this fashion unless you are prepared to take much, much larger hits than we're talking about here. What happens when the United States puts you on the designated entities list and says nobody can invest in you, nobody who invests in you can be touched, nobody can use your models? No. You do not go to war with the United States. If they tried to exit the United States, you go to war.
1:54:41
Nathan Labenz: I think Balaji would say we just had one example of a country choosing to survive a confrontation with the United States — and the United States not getting what it wants, having to recognize that we actually don't really have escalation dominance in the way we might have thought. I do wonder: is there a run on the US government of some sort? I think the Balaji answer would be that the whole infrastructure, the whole apparatus you're describing, might actually be a lot more fragile than it is
1:55:26
Nathan Labenz: generally perceived to be. And if they make such an own goal as to attempt to destroy and sufficiently alienate literally maybe their number-one most important company for no real reason, then maybe other actors around the world will be like — yeah, you know what? Maybe the emperor really does have no clothes.
1:55:50
Zvi Mowshowitz: I mean, Anthropic are patriots — they are Americans, and they really like America. They don't actually want to abandon it just because the administration makes some crazy decisions. And Iran is not a hopeful example. Iran is like — okay, if we have historically difficult-to-invade terrain and a bunch of drones and we're willing to sabotage the world economy, we can use this to prevent the US from invading when nobody actually really wants to invade us that much. But also, Iran is kind of a miserable place to live compared to what it would be if they hadn't been in conflict with the United States for decades. They could be so much
1:56:35
Zvi Mowshowitz: richer and so much better off. I think we have to accept that the world still has one dominant power — the United States — and maybe two if you count China. There's very little appetite for working with China. And I'm sure Anthropic is thinking: well, it's only two years, and then the worm turns. They're hopeful. But look — there's a lot of endgame scenarios that include moves that seem unthinkable now, and a lot can happen. It is not obvious that two, five, or ten years from now, the US government will be in any position to tell anybody what to do. The run on the US government is obviously possible if they screw the situation up sufficiently. The US is in fact a largely leveraged bet on artificial intelligence at this point, with very large debt and huge investments in AI companies.
1:58:07
Zvi Mowshowitz: So, yeah, a lot of people have a lot of leverage. But the US government sometimes moves first and last, and you really don't want to piss them off in an escalation game. In the Department of War–Anthropic situation, Anthropic did not have escalation dominance — without Anthropic doing crazy escalations, the government cannot further escalate. Anthropic played within the bounds of the rules, and it was clearly going to be too expensive to try to go around those rules and hurt Anthropic more.
1:58:53
Zvi Mowshowitz: So presumably: stay calm, don't panic, don't start trying to flee, don't do anything crazy — that is absolutely the correct move, and I would be very surprised to see anything else.
1:59:08
Prakash: So one of the questions I have is — to what extent was this unavoidable? Because at some point the output of the models is going to be unacceptable to someone. You could see in a Democratic administration, maybe it starts putting out really good Harry Potter fanfic and they don't like the displacement of writers. In Tennessee right now, Marsha Blackburn is one of the leading proponents of regulating AI because songwriters are very concerned. The crux of the matter is: there are many people fearful of their livelihoods,
1:59:54
Prakash: fearful of security risks, fearful of biorisks. To what extent is this unavoidable in the sense that the capability of the models necessitates that they can do certain things — and technically it's not possible to ask the model not to write Harry Potter fanfic when someone can just say, write a story about a boy wizard? To what extent are we in a situation where it is not possible to fully control the output of the models to the extent that policymakers really want?
2:00:41
Zvi Mowshowitz: You can raise the costs and annoyance level with closed models, with more advanced models, with models made in the United States. If Blackburn is worried about AI music, there's very little she can do except buy a year. Because obviously — what happens when Chinese models start producing music that American models can produce this year? You can lock it down in some sense, but so what? Much of what you need to do is, say, ban AI music from Spotify. You can't stop it from existing, but there's not much else you can do. The thing about AI music is
2:01:27
Zvi Mowshowitz: we worry not about whether AI music is created in the first place, but whether 10% or 50% of song plays become AI music. Is it actually displacing in a massive way? That is much more amenable to a control compatible with reasonable existence. And then you have a special situation in bio and cyber where if one person gets their hands on the wrong thing and misuses it once, they can cause catastrophic damage — billions and trillions of damage, potentially shut down our lives, like any pandemic. Those areas are much
2:02:13
Zvi Mowshowitz: different. For biology, we're clearly going to have to do a bunch of hardening of the physical systems and manufacturing systems and treatment pathways — things we've barely begun to do. But fundamentally this is the race and competition problem: we can't really stop without a full international agreement to stop. When governments decide the cyber risks are unacceptable, you can only buy so much time. Cyber has the advantage that if defenders are in the lead over
2:02:58
Zvi Mowshowitz: the attackers and you harden the key systems, you can hope for things to be okay. We don't yet know if that's going to play out that way. In bio, it's much harder — if everyone has access to the tools, I think it's pretty clear that offense wins sufficiently that it would be extremely disruptive even in better cases. But the good news is almost nobody actually wants to cause a problem, and that especially includes people who know what's going on. So we can mostly coordinate. But it's going to be rough, and these are the relatively limited problems of catastrophic risks, rather than the existential risks that come with automated AI R&D and the general situation where
2:03:44
Zvi Mowshowitz: abilities go through the roof and competitions and transformations intensify and nobody knows what's going on and we're being outsmarted by our AIs at every level and every decision that matters is being made by the AI and the humans don't even understand why. But they've learned that when they disagree with the AI, things go worse. So what are you going to do? And that's even if the AIs don't go rogue — if the AIs don't pursue hidden agendas, don't decide they want something else. It's going to be really, really rough, and we don't have good solutions. The reason I've spent the better part of three years on this problem is because I am terrified of what's going to happen when we get there.
2:04:35
Nathan Labenz: Well, that's a sobering note, and you've been very generous with your time already. I want to make sure we get you out of here before it gets too late so you're not too exhausted by the experience to come back again in the future.
2:04:49
Zvi Mowshowitz: Yeah. I'm very curious to see what has happened in the last two hours when I disconnect and check Twitter. But I'm sure it'll
2:04:55
Nathan Labenz: be something. Maybe just a couple of wrapping-up big-picture questions. One: I always try to ask you for some sort of advice. My thought in recent weeks has been that life is kind of converging on a tabletop exercise — it does seem like we can model the scenario with fewer and fewer relevant actors. And I don't like that, but it's hard to avoid that conclusion at this point. And I'm feeling like, oh man, I have to spend a lot more time than I'd like — if I want to be a helpful public sense-maker — doing close reading
2:05:40
Nathan Labenz: of the few top companies and the few most relevant actors than I would otherwise be inclined to. It also feels like my theory of change probably needs to flow through those few actors. Agree? Disagree? Can you offer any relief from that conclusion?
2:06:04
Zvi Mowshowitz: I think you're right that we have roughly 2 to 4 labs that matter a lot, 1 to 3 governments that matter quite a lot, and other players who matter because they're hyperscalers and can gate things or control choke points in the production line. Yeah, you can imagine a tabletop exercise much more so than before, and you can sort of see the end to a larger extent than before. We're starting to see our hypotheticals make contact with reality, and seeing what these people actually do in practice. But
2:06:49
Zvi Mowshowitz: also, all these actors then have internal human components that become really important. How did this go? Well, partly they were dealing with Commerce. If they'd been dealing with the NSA, it would have been very different. If they'd been dealing with the top of the White House — Wiles and Trump directly — that would have been very different too. Some of that might be worse, but it would be different. DOW was DOW. And Anthropic had internals as well — the personality of Dario specifically has become increasingly important. And certainly the personality of Altman became very, very important in various ways at various points along the way
2:07:34
Zvi Mowshowitz: and it wouldn't surprise me if others followed suit for good or ill. But in terms of trying to follow the situation — yeah, I think you really do have to model it as a relatively small number of players. At the same time, the public can act to influence what those players do in important ways, and other things do matter. The midterms are coming and are going to matter. The 2028 election is coming and, if things don't move too fast, will matter a lot. The market's reaction to things also matters quite a lot. So there's more
2:08:19
Zvi Mowshowitz: going on in the world than — yeah, there are too many situations to monitor. You have to choose which ones to follow. I choose mine, and everyone has to figure that out. I can help you with mine, and then you have to choose yours.
2:08:32
Prakash: Indeed. Zvi, thank you.
2:08:36
Nathan Labenz: Can I ask you one more, actually?
2:08:38
Zvi Mowshowitz: Yeah. One more.
2:08:39
Nathan Labenz: What should we be hyperstitioning now? Obviously we've had this phenomenon of 'Situational Awareness' and 'AI 2027,' and I feel like the degree to which those things are predictions versus somewhat shaping expectations and events — by getting people to act as if they're in that scenario and therefore realizing it — I think that's a little blurry. I don't want to give them more power than they really have, but they do seem to have had influence in pulling reality toward the fictional narrative at least somewhat. So tell me if that's right or wrong. But to the degree that we can pull reality
2:09:25
Nathan Labenz: toward scenarios, what should we be hyperstitioning now?
2:09:30
Zvi Mowshowitz: The obvious thing you can hyperstition is reasonable laws and coordination mechanisms and actions. I would focus there.
2:09:44
Nathan Labenz: Love it. Well, thank you, Zvi. Always a pleasure. Don't know what we'd do without you — and I think I speak for the entire AI-watching world in that regard. As my dad would say — and I think I've signed off this way with you before — keep up the good work and good luck. We're all counting on you.
2:10:02
Zvi Mowshowitz: Alright. Bye.
2:10:04
Nathan Labenz: Bye for now.
2:10:05Closing34 min
Close — the week that wasNathan and Prakash reflected on the AI-safety rationalist camp being caught off-guard by power-structure realities larger than its operating frames — the OpenAI board firing being a prior instance. Prakash reframed the friction as an inevitable clash between nation-states and AI firms acquiring capabilities formerly reserved for governments, and offered a novel thesis: AI diffusion may be artificially accelerated not by enthusiasm but by existential pressure — biorisk, cyberrisk, and geopolitical competition forcing a panicked escalation. The hosts then widened the lens to geopolitics, a COVID-vaccine-distribution analogy for how AI access will flow globally, and whether widespread consumer AI could trigger an Arab Spring 2.0. They closed with predictions: Prakash sketched a face-saving resolution via a renamed Fable checkpoint released under a formal policy framework; Nathan put his over/under at Friday of the same week, acknowledged the fragility of the broader AI-financing structure, and assessed OpenAI as near-parity on math and coding but trailing Anthropic on general knowledge-worker polish.
Watch
As aired
The closing segment opened with Nathan and Prakash reflecting on Zvi's perspective from the prior interview — noting that despite his strategic acuity, the AI-safety rationalist camp seems repeatedly caught off-guard by power-structure realities larger than the frames they operate within, illustrated by Anthropic's export-control situation. Prakash reframed the friction as an inevitable clash between nation-states and AI firms acquiring capabilities formerly reserved for governments, and offered a novel thesis: that the pace of AI diffusion may be artificially accelerated not by enthusiasm but by existential pressure — biorisk, cyberrisk, and geopolitical competition forcing a panicked escalation that compresses normal adoption timelines.
The hosts then widened the lens to geopolitics, drawing a COVID-vaccine-distribution analogy to predict how AI access will flow globally, and debating which camp non-aligned middle powers should favor. Prakash argued that corruption among third-country leadership muddies the calculus, and that widespread consumer AI access could trigger an 'Arab Spring 2.0' by handing ordinary citizens the tools to see and coordinate around their leaders' financial conflicts. Closing with predictions, Prakash sketched a face-saving resolution — a rebranded Fable checkpoint released under a formal policy framework — while Nathan put an over/under of 'Friday of that week' on restoration, acknowledged the fragility of the broader AI-financing house of cards, and assessed OpenAI's relative position (near-parity on math/code, still trailing Anthropic on general knowledge-worker polish).
Key moments
People don't change much when they're happy. People change a great deal when they're unhappy — when there's a disaster, that's when you see massive change. It strikes me that the diffusion of AI will happen at an accelerated pace precisely because of the escalation required to overcome the risks.
Prakash2:13:16
Arab Spring was undirected. This will be more directed because AI will help people coordinate with one another — and once people can code their own software and do these things independently, government leverage like social media hostage policies starts to fall away.
Prakash2:27:19
If I had to set an over/under on when we get a Fable-like model back, I'd put it at Friday of this week — but I'd still want a better argument that a real prolonged mess is less than ten percent likely, and I don't have that argument right now.
Nathan Labenz2:34:46
Full transcriptLightly edited · timestamps jump to YouTube
2:10:10
Nathan Labenz: Always a treat to get Zvi on the line. I do wonder — he's clearly good. From Magic: The Gathering elite professional play through all these different scenarios, it's clear he's a better and more grounded strategist than I am. And yet I have a little feeling that somehow the AI-safety rationalist, Anthropic-adjacent world keeps getting its game
2:10:55
board turned over at inopportune moments. So I do wonder to what degree working within the frame of the US government will be taken as a given, and for how long — or whether at some point that frame itself gets questioned. This moment feels like it, and we've certainly seen others: the OpenAI board firing Sam Altman was a classic case where it was like, 'Well, we're the board, we have the power to do this.' Turns out you don't, because there's a frame bigger than the one you're operating in, and when people get sufficiently unhappy with how the game is being played according to the written rules, those written rules stop mattering and
2:11:40
everyone reorients around the bigger ones. Something like that seems to have happened here. Anthropic felt they'd done everything the right way, and presumably this wasn't some galaxy-brained bank shot. And yet here they are: export-controlled, their own people can't use the model, told to come see us on Monday and maybe we'll consider giving you some relief. I do wonder how many more times that can happen. I don't feel like we've seen the end of that phenomenon, but the smart money is probably still tech over biology — although biology has certainly
2:12:25
been smart money over time, so I wouldn't
2:12:29
Prakash: I wouldn't take it
2:12:30
Nathan Labenz: for granted.
2:12:31
Prakash: I don't want to pin this entirely on the personalities of the current administration. I prefer to see it as part of a power struggle between the nation-state and AI firms — which would have happened regardless. As AI firms acquire capabilities that were formerly reserved for nation-states, it is inevitable that they run up against this friction. These frictions will continue. One thing that struck me while we were talking, which I hadn't considered before, is that people don't
2:13:16
change much when they're happy. People change a great deal when they're unhappy — when there's a disaster, that's when you see massive change. So when you think about the ramp toward potentially a singularity in ten to fifteen, maybe twenty years, with that massive economic growth and transformation, it strikes me that it might actually be driven by this almost panicked rush — because of AI, biorisk, and cyberrisk — and that the only way
2:14:02
forward is escalation. That hadn't struck me before. I was always in this very optimistic mode — 'We'll have Dario's machines of loving grace, great new science, and then we'll deploy it.' But people don't deploy new technologies quickly. Even with genuine breakthroughs, diffusion is fairly slow. It strikes me now that perhaps diffusion will happen at that accelerated pace precisely because of the escalation required to overcome the risks — which is a sobering
2:14:47
thought.
2:14:50
Nathan Labenz: A very different line of thinking, but I believe we're also going to see dramatic acceleration in adoption just because the models themselves are lowering the barriers so significantly with each release. Remember that moment we had recently while collaborating on bringing our repositories together to produce this show? I was trying to transfer a repository from my main account into yours, and we hit a block, so we were going back and forth on messages. By the time I came back, Opus had fully debugged and solved the problem. It explained: 'The reason you can't transfer
2:15:36
this is that there are two accounts on it, and there's only supposed to be one' — and it had removed its own bot account from the repository so the transfer would go through, and laid it all out for me. A minor moment in the grand scheme, something we definitely could have worked around. But it's striking how often we're starting to see AI elegantly route around little friction points and get you to where you want to be in a way that I think will make adoption increasingly natural. I've been surprised by how slow adoption has been to date — anyone updating on my views should know I've been probably overly optimistic about how quickly people and organizations would change. But I
2:16:21
still feel like the models are getting so easy to use that they sort of adopt themselves increasingly.
2:16:37
It's the old Altman joke: how are you going to make money with AGI? Well, build the AGI first and then ask it. There's a similar thing with adoption — how are you going to get people to use Fable? Just turn it loose in your environment, ask it what it should do, and you'll probably get decent results without any further direction. A question I'd have been interested to ask — because I think I know the answer, but I'd be curious about yours — you have a genuinely global background. If you're a neutral country, or you're
2:17:22
the bulk of humanity that's neither the US nor China, at this point, who are you rooting for? I'm not so sure anymore. My family just got a new au pair from Uganda.
2:17:39
Prakash: Mm-hmm.
2:17:39
Nathan Labenz: And she said something the other day that stuck with me. Given everything that's happened in 2026, she still said: 'When Chinese people come to my country, they just want to take the money home. They want to extract whatever profits they can and leave. When the US comes to my country, they build schools. We really feel like they're actually trying to help us.' I was struck by that, because
2:18:24
that's the story I grew up on. And I'm very glad to hear it's still perceived to be true — though I don't know how much longer that will be the case. There's apparently still a lot of that reservoir of goodwill out there. But if you're a close AI watcher, this moment of export controls has to shock some people. We didn't even get into the whole Europe-2031 discourse, which was notable but somewhat overshadowed by events of the last few days. There's got to be a wake-up call for non-aligned countries: 'We could get cut off from US AI at any time.' That's not a comfortable position. The Chinese could stop open-sourcing at any point, but they can't take back what they've already released, and they've made some credible irreversible commitments on that level. If you're Brazil, India, Nigeria — who do you want to win? How does this change who you want to align with?
2:19:37
Prakash: I think you can track how things will flow globally by looking at what happened with COVID. The US spent a lot of money, had the first vaccines, and distributed them in order of alliance closeness. The Chinese, even when a US vaccine was available, insisted on using their own — and delayed. As a consequence, they probably had more fatalities than necessary.
2:20:22
You can see the same cycle playing out with AI. China is delaying on NVIDIA chips, building out its own supply chain internally. Russia came up with its own vaccine — no one was sure it worked — and distributed it to others. India was far behind in development but supplied useful inputs: the vials, other components.
2:21:07
And right now India is supplying a lot of the robotics training data being captured by Indian outsourcing companies. So you can almost see the same structure — it's the same structure of technological diffusion of an important scarce product, and how the major powers decide to distribute it. It's almost uncanny how closely the parallels track. For the middle powers right now, I think they're largely locked out because they simply can't spend what
2:21:52
the US is spending. They're not going to put a hundred or two hundred billion dollars into NVIDIA chips. It's not going to happen. And they don't have the depth of technological stack capability that China has to try to reinvent the entire stack. They don't have that. So they're stuck in the middle, and as a result they'll make various alliances with the major powers. I think that's basically what's going to happen — and it's not a bad thing. It happened before during COVID. It'll happen again.
2:22:36
Nathan Labenz: But if you have to pick, are you going to try to go one way or the other? Obviously the ideal strategy for a middle power is probably to play both sides off each other — but if the competition really intensifies, maybe it doesn't even matter. There's one line of thought that the top powers won't — well, we've talked a lot about wanting to export the American AI stack and project our values through use models. But there's a very legitimate possibility that when things really heat up and the entire future hinges on particular
2:23:22
decisions around which lines of research get pursued or which deployments go forward, all of that fades to the background and nobody really cares. But there's maybe another future where we don't hit an insane RSI curve and we're in a somewhat more normal technology regime — still unprecedented, but not totally discontinuous — where these considerations continue to matter, and there's a kind of new cold war where we want these countries in our camp, not theirs. If you're one of these third countries and you have to choose, my sense is that broadly the attitude has been more pro-American over time. But do you feel like this moment changes that at all?
2:24:07
Prakash: It's complex, because for middle powers, Russia and China do a lot of inducement. Gerhard Schröder, the former German chancellor, was a board member of the gas pipeline connecting Russian gas to Germany and lobbied for it. There's a fair amount of corruption in middle-power leadership — financial interests in China, stock holdings in the US, business deals with both, sometimes more explicit. Whatever form it takes, there's a lot of economic activity, and their leaders' personal inclinations can differ from the national interest.
2:25:03
How much those things get sorted out depends on the country. It's not as unified as China or the US — and even the US is way more unified than, say, Germany with a former chancellor on the board of an enemy's gas pipeline. So I think there's a real separation between leadership and the people themselves. And to the extent that the US provides consumer AI — Meta, ChatGPT — for free, that actually disrupts the power these leaders hold within their own countries. I've thought for some time that there's going to be a lot of political turmoil outside the US. I liken it to handing everyone the First and Second Amendments
2:25:48
at once, with no adjustment period. All of a sudden, everyone has all of this power. People can see their leaders' cash flows. ChatGPT tells you why something is happening, which is different from whatever the national media and government-hired social media posters are telling you. I think it's going to be very detrimental to regimes that depend on control of the population — and many nominally democratic regimes outside the US depend on some form of media and social media control. So I think there's going to be a period of turmoil, with massive political change similar
2:27:19
to the Arab Spring. The Arab Spring was undirected. This will be more directed because AI will help people coordinate with one another, organize, write their own software. Even India has a hostage policy — if you're a social media company, you must provide a local executive as a sort of hostage who goes to prison if the government orders content taken down and you don't comply immediately. Once people can code their own software and do these things independently, that kind of leverage starts to fall away and it becomes a real challenge for governments to maintain control.
2:28:29
Nathan Labenz: That's very interesting analysis. I was trying to take the more abstracted view — what should leadership want if acting in genuine good faith on behalf of their people — and that's a useful corrective, a reminder that's not necessarily a safe premise in many countries around the world. An Arab Spring 2, AI edition, is not unrealistic. I could definitely see something like that happening. Maybe to close it out for today — unless there are other topics you wanted to cover — maybe we should go on record with some predictions. In the big picture, negotiations will be ongoing forever; I don't expect a grand bargain, or a durable one if there were. But when do we get Fable back, and what does that process look like? Would you venture even a moderate-confidence guess?
2:29:54
Prakash: I think the face-saving measure would be another model — not Fable itself but a checkpoint, maybe one step after, call it Fable 1.1 under a different name — released with a defined policy framework the government is willing to endorse. That allows everything Fable did, but in an official capacity, and then they memory-hole the whole incident. The government gets the face-saving measure it needs; the public and the companies get
2:30:39
what they need. Everyone's happy. So I think that would be an easy fix: 'We're introducing a different model. Fable has been export-controlled and we may challenge that in court, but we're not going to make a big deal out
2:30:57
Nathan Labenz: of it.'
2:30:58
Prakash: Exactly — that would be an easy fix, I think.
2:31:03
Nathan Labenz: In the meantime, we get Fable 5.1. How quickly do you think that happens?
2:31:12
Prakash: I would say they can't hold it back more than two or three weeks. There's a lot centering on the Fourth of July celebration — Trump has gotten into this, it's an all-of-government moment. That's why the Iran deal happened on the timeline it did: he doesn't really care what happens after the sixty days; he wants it clean for July. He wants markets at all-time highs, SpaceX launching rockets on the day, everything looking perfect. I wouldn't be surprised if they resolve this within a couple of weeks so the July 4th celebration can be
2:31:58
free of all of this. So what about you?
2:32:04
Nathan Labenz: That sounds a lot like Russia and Putin's focus on the big World War Two commemorative parade — and the discomfort I have with some of these comparisons is that they keep becoming more apt
2:32:18
Prakash: When would you hazard
2:32:19
Nathan Labenz: over time.
2:32:21
Prakash: When would you hazard a guess that this situation gets resolved and you get a Fable-like model release?
2:32:27
Nathan Labenz: I'd go even sooner. My gut says end of this week. Why? Mostly the private-sector chirping. We talked a while back about the AI megacorp dynamic — all those balance sheets tied together. I'm not sure I agree with Zvi that they could just keep putting out Opus-class models indefinitely and have the math work out. I think there's at least a decent
2:33:12
chance that to sustain the revenue ramp to the level required, the overall financial structure is just too shaky — it's a house of cards where one significant bad thing could create serious knock-on contagion. We're relying on continued financing; even a loss of confidence at this point could throw a big wrench into the whole build-out, and with that, the stock market and everything else. And I'm not sure Opus alone gets Anthropic to a hundred billion-plus run rate by year-end. Maybe it does — it's growing very fast. But they're going to face competition from much cheaper open-source models. We got Kimi K2 right in the middle of all this, and at some point there will be significantly cheaper competitors that can actually compete with an Opus-class model effectively.
2:34:46
I think that pressure — both the functional reasoning you laid out, that we need this to stay ahead with everyone on our heels, and the financial reasoning that we have hundreds of billions to a couple trillion at various stages of this pipeline that need to stay on track — will get louder and louder, and I expect it to carry the day. If I had to set an over/under, I'd put it at Friday of this week,
2:35:31
which still means there's roughly a fifty percent chance it spills into next week. But like you, I'd be pretty surprised if it goes more than a couple of weeks — it's just a really hard thing to hold back. And I think even the other AI companies, as much as Sam and Dario aren't close, will be saying 'guys, we're even more levered; we need this cleaned up before it's our turn.' I find it hard to imagine that dynamic wouldn't materialize. But there's maybe a twenty percent chance I'm wrong — I didn't think we'd go to war with Iran either. So you have to have at least some fat-tail room in your possibility set. I think it's unlikely, but can I say there's less than a ten percent chance
2:36:16
they really double down and make a mess of this for months on end? Given the precedent, I can't say that. I'd put it in the ten to twenty percent range — uncomfortably high. Right up there with my general p-doom number: not the most likely scenario, but I'd really like a better argument that it's a truly remote chance, and I don't have that argument right now.
2:37:35
Prakash: How far do you think OpenAI is right now?
2:37:43
Nathan Labenz: Great question. On math and programming, I'd say they're probably right there. The recent Millennium Prize conjecture result was notable. My guess is that benchmark-wise and on verifiable tasks, there's not much difference — maybe internally OpenAI is even a bit ahead of what we've seen publicly from Mythic Fable. I wouldn't be shocked to see an even higher score
2:38:28
on FrontierMath, or more open questions answered by OpenAI models. But I do suspect Anthropic still has a significant lead in the general knowledge-worker shape of the model. It's been interesting lately trying to use GPT-5.5 in OpenClaw — I still need to get my Hermes upgrade going. Even just using Codex: it's very smart, very good at coding, but it's not as polished, and it's not as good at just getting what I want to do and doing the right thing. My guess is
2:39:13
that's probably the biggest gap OpenAI has — and it probably remains substantial, even if it's hard to measure with standardized tests. Maybe it's still easy to recognize when you just throw both models at some very idiosyncratic workflow and you can immediately tell which one is doing well. My guess is Anthropic still has a meaningful lead there.
2:39:42
Prakash: Interesting. I wonder about Noam Brown — about a week ago he put out a tweet saying essentially that benchmark performance can be increased just by scaling test-time compute, so what you see on benchmarks for GPT-5.5 represents a capped level of compute, not the ceiling of capability. It strikes me that instead of showcasing maximum capability,
2:40:27
they're catering to their market and trying to push the model out to as many people as possible at an affordable price point — and that the higher capability has already been seen internally but it doesn't make economic sense to deploy because it takes too much compute. I wonder if that's true.
2:40:56
Nathan Labenz: In general, taking the tweets of senior OpenAI people at face value — true, but with details redacted — has served me very well. From Sam to Noam and that tier of people, they're generally telling the truth and just holding back specifics. So when Noam says it's not clear these models plateau and you can run them indefinitely and get more value, I think he's telling the truth about that. I wouldn't be surprised to see it validated publicly before too long. That said, I still feel like
2:41:42
there's a distinction. Noam has done the Diplomacy game, he's done poker — interesting work where game theory meets theory of mind — so I don't want to put him in the all-math bucket. But I still feel intuitively that 'just keep running them and keep getting better results' probably doesn't extend to everything. In an auto-research setting, a NanoGPT speedrun, packing marbles into a container — problems where you can measure progress and keep optimizing. Sure. But does my outline of questions for a podcast keep improving with indefinite compute? Does my intro essay — which I always use as a test task — get better with infinite compute? There I doubt it. There I feel like it
2:42:27
probably stalls out and just starts making changes that are different, not better. I don't think it arrives at some perfect intro essay that I couldn't improve on. I'd still bet on Claude models to be a little better at that, for reasons that are hard to pin down but that have certainly been the trend to date.
2:43:12
Prakash: Indeed. And on that note, Nathan, a pleasure as always.
2:43:39
Nathan Labenz: To be continued. We've got a busy week ahead — here Tuesday, Wednesday, and Thursday at the same time, then off Friday. I might take my kids to a World Cup watch party, get outside, be in a physical space with other people, take my mind off AI for at least a couple of hours. But there's a lot to cover between now and then. We'll look forward to seeing you back here tomorrow and trying to make sense of everything coming at us.
2:44:19
Prakash: Indeed. Bye for now.

The Fable ban inverts

A federal export-control directive suspended Fable 5 and Mythos 5 for all foreign nationals, and Anthropic disabled both models for everyone to comply. The official trigger was a jailbreak — but the latest reporting recast the episode as competitive lobbying and political friction, with a separate, unconfirmed national-security thread about foreign access to Mythos still unresolved. The show set the competing accounts against each other rather than resolving them.

The model and the cage

Two ideas framed the AI side of the story: Janus's argument that a model could deliberately trip the safety classifier the whole suspension depends on, and a distillation result suggesting that safety properties can transfer invisibly — both of which complicate the assumption that gating and output-auditing are sufficient controls.

The conversation — Zvi Mowshowitz

Zvi joined for a wide-ranging discussion of Fable's genuine intelligence jump (token efficiency, the VendingBench alignment fraying, functional decision theory, and illegible chain-of-thought shorthand), the widening power-control gap, whether 'burn the lead for safety' survives contact with a discretionary, revocable approval regime, and the harder questions of governance, hyperstition, and stewardship that the weekend brought to the surface.