Opening: Cognitive Empathy for the Administration, Dean Ball to OpenAI, GLM 5.2
Nathan opened with a weekend recalibration: after Judd Rosenblatt called out the AI safety community for a lack of cognitive empathy toward the Trump administration, Nathan acknowledged he had probably been too harsh. Even if export controls are a sledgehammer applied to the wrong nail, the administration is engaged and taking AI seriously — and the constructive response is to educate, not ridicule. Prakash extended the point with Warren Buffett's 29-year lag before investing in Google, illustrating how vast the bubble-to-outside gap truly is and suggesting the administration's clumsy moves may reflect genuine unfamiliarity rather than bad faith.
Nathan then recapped his extended Cognitive Revolution conversation with Dean Ball, who is leaving White House OSTP on July 6 to lead OpenAI's new Strategic Futures team — chartered around catastrophic risk, recursive self-improvement, labor disruption, and lab–government relations. The irony Nathan flagged: Ball advocates for broad diffusion over nationalization while concluding that effective frontier-AI policy may require the inside access that only comes from working at the lab. Prakash closed the open on GLM 5.2 from Zhipu, the Chinese open-weights model released June 16 that dominated weekend discourse after Matt Veloso (former Meta/Google DeepMind VP) declared it the first open model that passes as a daily driver — a post that briefly spiked Zhipu's Hong Kong stock.
Interview: The State of AI Engineering — swyx
Shawn Wang (swyx) — who coined 'the AI Engineer' in his 2023 essay 'The Rise of the AI Engineer' and now curates the AI Engineer World's Fair while doing evaluation research at Cognition — joined for a wide-ranging conversation anchored in the state of the field as he sees it across the 30 tracks he programmed for this year's World's Fair. Software factories have replaced coding agents as the organizing frame; the RAG track is now 'search and retrieval'; continual learning sits at an unresolved fork between weight-update partisans and systems-side pragmatists.
The FrontierCode benchmark — swyx's evaluation project at Cognition, inspired by METR's finding that roughly half of SWE-Bench-passing PRs are unmergeable slop — anchored a long sequence on measuring production-ready code quality. Results show Fable scoring roughly 2.5× higher than Opus at less than 2× token cost. The conversation widened into ecosystem structure: the adviser/router pattern (OpenRouter Fusion, Sakana Fugu) is theoretically limited because a weak model doesn't know what it doesn't know; cloud infrastructure is being rebuilt for agents at a 10–15× scaling stress level straining GitHub and the SaaS stack; and enterprises are beginning to vibe-code their own internal systems of record rather than pay for a dozen $20/month subscriptions with siloed chatbots bolted on.
swyx was unusually candid on the bear case for synchronized lab IPOs — the 'Illuminati group chat' read that insiders may be distributing to retail at peak hype — while tempering it with an honest bull case that consumer agent adoption is probably still 100× underpenetrated. The segment closed on the CS enrollment collapse (Stanford −42%, Berkeley −61%) and swyx's career advice: the job market for juniors is genuinely hard, but the members of technical staff at every frontier lab are 21 years old — demonstrate taste, ship interesting things, and practice 'learning in public.'
Close: Guess the Markets
After the swyx interview, Nathan and Prakash ran a live 'Guess the Markets' segment — twelve AI-related prediction-market questions drawn from Polymarket, Kalshi, and Manifold, with neither host having seen the questions or market values in advance. Topics ranged from Anthropic holding the top Chatbot Arena rank through 2026 (both guessed ~55–60%; market said 63.5%) to which company leads in math benchmarks (both said OpenAI ~50%; market gave Google 73%, surprising both hosts), whether any company would formally announce AGI before 2028 (Nathan 10%, Prakash 80%, market 45%), and whether any model would hit 1550+ Chatbot Arena Elo in 2026 (both guessed 80%; market said 18%, prompting a discussion about Arena score saturation). The segment also covered OpenAI's IPO prospects, Anthropic's valuation versus OpenAI's, NVIDIA's market cap leadership, and GLM 5.2 topping LM Arena. Nathan called it an experiment worth refining and floated posting the quiz on the show website; Prakash suggested logarithmic scoring for a future round.