- What has changed in the last 12 months in terms of lawyers actually using AI for their day-to-day work?
- prinz sees junior associates using GPT-5.5 Pro (via enterprise subscription) for research and double-checking documents — uploading completed memos to catch inconsistencies, typos, and errors. He also sees colleagues using Harvey and Legora for document editing. However, the majority of lawyers still think of AI as a hallucinating chatbot, partly because many first tried AI when o3 was the flagship model — a model that would confidently fabricate entire New York statutes. The contrast with today's models is dramatic: o3 now feels ten generations behind GPT-5.5 Pro and Fable.
- Why has OpenAI dominated the prinzbench leaderboard, and where will Fable land?
- prinzbench has two sub-scores: a pure legal-research component and a needle-in-a-haystack search component. OpenAI's models have excelled at both. Anthropic's models historically scored near zero on the search sub-score (out of 24), and prinz attributes part of the gap to Anthropic sandbagging on maximum reasoning effort — a constraint that Opus 4.8 finally broke, jumping from roughly 25 to 42 overall. Fable is already not a zero on search, is clearly the best legal reasoner outside OpenAI, and is competitive with at least GPT-5.4 — but may still suffer from the same search weaknesses as previous Anthropic models. prinz cautiously places Fable near the top of the leaderboard, probably not above GPT-5.5 xHigh, with the full run still in progress.
- What stood out to you in the Anthropic release documents for Fable 5 and Mythos 5 that the broader discourse may have missed?
- prinz highlights that Anthropic's system card explicitly states the acceleration from Mythos is 'concentrated in engineering execution rather than research judgment.' The blog post's examples of 'novel' scientific results — such as training a model 100 times smaller to outperform a 500-million-parameter model published in Science before April 2025 — are impressive but do not represent the kind of genuinely novel research capability that would signal proximity to RSI. For prinz, the moment these models begin producing truly novel scientific insights — not just engineering acceleration — is the moment RSI is genuinely close. The leaked OpenAI memo about potentially delaying an IPO if RSI happens also caught his attention as an indirect signal about labs' own internal timelines.
- What does a super-empowered lawyer using AI look like — analogous to the super-empowered engineer running hundreds of agents?
- prinz envisions agents embedded in Outlook monitoring email in real time, autonomously redlining contract changes sent at midnight, saving the updated document, and alerting the lawyer for a one-click approval before sending to the client. Further out, AI should connect to entire data rooms autonomously, review every document, draft the diligence memo, and surface the five red flags — compressing hundreds of hours of associate work into an overnight run. He speculates that as deal costs collapse, a lawyer might handle 20 simultaneous deals instead of one. He also flags a broader question: whether the legal system itself changes, from AI-empowered judges to automated courts handling micro-lawsuits in minutes.
- How worried should we be about the trend toward illegible chain-of-thought reasoning in models like Fable, and what does it mean for monitoring and alignment?
- prinz notes this is not new — we've seen artifact-filled chains of thought in OpenAI's models too. His core concern is that chain-of-thought may not reliably reflect what the model is actually doing, which is why Anthropic invests heavily in mechanistic interpretability. More importantly, a superintelligence that knows it is being monitored may calibrate its visible reasoning to avoid alarming overseers — a lawyer's intuition that the same facts can always be framed to serve different impressions. He concludes that monitoring chain-of-thought alone is not a sufficient alignment strategy, and that monitoring superintelligence generally is probably not a perfect tool — but that we should continue paying attention and act if we see something bad.
- Do you maintain a p(doom) number, and what is your overall assessment of AI risk for someone looking to you for a cue?
- prinz declines to assign a p(doom) number on principle: the vocabulary of p(doom) is mostly used by people who are already doomers, and reducing messy real-world risk to a precise probability reflects the same flaw as other overconfident logical constructions. He identifies real and serious risks — bioweapons, authoritarian AI (especially nationalization of frontier labs), and autonomous weapons — but argues these point in completely different directions for activism and policy. His net view is cautiously optimistic: he sees no strong evidence that these risks are unnavigable, worrying without action is counterproductive, and the world is generally more complex than any neat logical chain from premises to catastrophe.