
Hey, it’s Andreas.
Last week's poll on my planned AI Agent Mastermind Cohort blew past what I expected - I saw that 800+ people answered with yes, but the old settings didn't capture who said yes or what you wrote. So I'm running the poll once more, properly this time. If you raised your hand before, do it again here please (more infos will follow then this week).
A quick reminder of what it is: not a basic overview course. A weekly live format where we build from day one - build real AI Agents to make you more productive, automate real parts of your work, and even build AI products of your own. It will be intentionally small and selective - for business professionals who are done watching AI happen from the sidelines and want to build. If that's you, please add yourself to the waitlist.
Want to be the first to hear about Agentic AI Mastermind?
In today's issue:
• OpenAI adds Plugins and Sites to Codex
• Microsoft runs an AI Skills Fest with free certification vouchers
• Anthropic calls for a pause mechanism before self-improvement arrives
• Stanford study finds AI legal tutors outperform professors in blind tests
• Why Claude Code and Codex together are the ultimate unlock
Let's get into it.

Weekly Field Notes
🧰 Industry Updates
🌀 OpenAI adds Plugins and Sites to Codex → Codex now gets role-specific Plugins for workflows like design, data analysis, and app integration, plus Sites for building shareable apps with databases, storage, env vars, and access controls.
🌀 OpenAI upgrades ChatGPT memory with “dreaming” → ChatGPT now builds a structured, evolving profile across work, travel, hobbies, and preferences. The bigger play: better personalization, stronger continuity, and more lock-in.
🌀 Microsoft uses Build developer conference to unveil (new) full AI stack
→ Microsoft introduced seven MAI models, Scout as an always-on work agent, and Project Solara for agent-first devices.
🌀 Ideogram and Reve push image models toward controllable design
→ Ideogram 4.0 and Reve 2.0 both move beyond prompt re-rolls into layout-aware, iterative image generation. Ideogram makes the open-weight case stronger with top-tier typography and design quality, while Reve’s segment-level editing points to the next UI: images edited more like structured layouts than static outputs.
🌀 White House explores public stake in OpenAI → The U.S. government is reportedly discussing a 1-5% equity stake in OpenAI, potentially routed into a public wealth fund.
🌀 NVIDIA unveils RTX Spark for local AI agents → A new Windows laptop and desktop platform for running agents locally instead of defaulting to the cloud. With Blackwell RTX, a 20-core Arm CPU, and up to 128GB unified memory.
🎓 Learning & Upskilling
📘 Microsoft Learn runs AI Skills Fest with free certification vouchers → You need to complete an AI Skills Fest playlist to earn a Credly badge and unlock a free Microsoft Certification exam voucher. Solid low-friction upskilling option for anyone building AI skills with official Microsoft material.
📘 Anthropic shows when to use tools, skills, or subagents → Anthropic walks through how to decompose an overgrown agent into tools, skills, and subagents, using evals after each change. Practical framework for anyone moving from prompt-heavy prototypes to maintainable agent systems.
📘 Peter Steinberger shows how to build tools for agentic development → At Microsoft Build, OpenAI/OpenClaw’s Peter Steinberger shared how his team uses small internal tools to close issues, manage API limits, spin up test environments, automate reviews, and run agents in parallel.
📘 DataCamp shares OpenClaw cheat sheet → A quick starter guide for OpenClaw, covering CLI commands, scheduled briefings, multi-channel automation, Ollama setup, and multi-agent workspaces.
🌱 Perspectives & Research
🔹 Anthropic calls for a pause mechanism before self-improvement arrives
→ The report argues that frontier labs need a way to slow or pause development jointly, with verification across countries. The hard part: nobody has built credible verification for AI training pauses yet, and the incentives to defect are enormous.
🔹 AI Lab CEOs back DNA screening rules for biosecurity → Sam Altman, Dario Amodei, Demis Hassabis and other AI leaders signed an open letter backing mandatory screening for synthetic DNA orders and DNA printers. The logic is pragmatic: AI models can be bypassed, but dangerous biological work still needs physical DNA - and that supply chain can be checked before it ships.
🔹 Stanford study finds AI legal tutors outperform professors in blind tests → In a contract-law tutoring test, faculty preferred answers from Gemini 2.5 Pro and NotebookLM over professor responses 75% of the time. The signal is serious: AI is moving from passing exams to competing in ju
🔹 ChinaTalk maps the gray market for U.S. AI model access in China → A network of proxy vendors reportedly gives Chinese developers cheap access to OpenAI, Claude, Gemini, and other restricted models. The risk is not just sanctions leakage - users may get weaker routed models, lose prompt data, and feed a parallel distillation market outside normal governance.dgment-heavy teaching scenarios.

♾️ Thought Loop - What I've been thinking, building, circling this week
Throughout this year and the last, my go-to choice for writing code has consistently been Claude Code. I've extensively used it, authored comprehensive guides and setups, incorporated it into university courses and even started writing a book about it. Then Codex moved to GPT-5.5 in late April, while Anthropic rolled out the Opus 4.8 update. After that, the gap that had mattered a lot to me became much smaller (I wrote about that here).
So I did the obvious thing and ran both for a while. Some days I used Codex (GPT-5.5) for everything. Some days I switched back to Claude Code (Opus 4.8). I ran the same kinds of tasks through each tool in isolation: a messy migration, a small auth change, building a landing page, a refactor across a handful of files, a feature that started from vague requirements. Codex was fast and literal, very good at doing exactly what I asked. Claude Code was the better environment for the parts before the code exists: scoping, working through ambiguity, holding a longer implementation together without losing the thread. But the more I started using them together the better my results became. It's evident that if you're into agentic coding and aren't utilizing both simultaneously, you're truly missing out.
Model monoculture
The reason this works is not that two models are smarter than one. It is that they are wrong in different places.
When a single model writes code and then reviews its own code, you are asking the same training distribution to catch its own blind spots. It mostly can't. The errors it is prone to making are exactly the errors it is prone to missing. I like to call this model monoculture: one model, one set of priors, one shared failure mode running end to end, looking clean the whole way. The dangerous property of AI-generated code is not that it is bad. It is that it looks correct before it is correct, and a self-review preserves that illusion rather than breaking it.
A differently-trained second reader breaks it. Codex flags things Claude waved through; sometimes Claude pushes back and is right, sometimes it reassesses and fixes. The disagreement is the signal. (This is the same reason we don't have authors copy-edit their own books, and it is roughly why peer review exists, imperfect as it is.) You are not buying a smarter coder. You are buying an argument.
An easy was to leverage both models
For a while the two-model loop was annoying enough that I didn't always bother: write in Claude Code, copy context into Codex, copy feedback back, ask Claude to fix. Every hop costs momentum, and momentum is most of the value of these tools.

The thing that made it stick for me is OpenAI's official Codex plugin for Claude Code (openai/codex-plugin-cc), which is a slightly strange object: a tool from one lab that pulls a competitor's model into the other lab's terminal. Once installed it adds a set of /codex: slash commands to the Claude Code session, so the second opinion lives where I already work. It needs Node.js 18.18+ and either a ChatGPT subscription or an OpenAI API key; /codex:setup checks you are authenticated.
Three of the commands do most of the work, and they map cleanly onto the risk of the change:
/codex:reviewis the routine pass: correctness, missing tests, edge cases, whether the implementation actually matches the instruction. This is the everyday quality gate I use the most to improve Claude Code quality./codex:adversarial-reviewis the one I actually use the most. It is built to question design decisions and hunt for the failure that ships to production. I point it at anything I would lose sleep over (auth, permissions, migrations, payment flows, anything touching customer data). This has also become an important part of my daily workflow./codex:rescueis for when Claude Code is stuck in a loop, patching symptoms instead of the cause. A second model looking at the same problem from a different angle is often a faster fix than better prompting.
The shape of my loop now is small and boring, which is the point: Claude plans, Claude builds, Codex reviews (or adversarially reviews), Claude evaluates the findings and fixes only the valid ones, Codex re-reviews, and I approve. Sometimes Codex also handles the rebuild itself, and this is happening more and more. When it does, I use Claude to review the build again and catch anything Codex may have missed.
I do not run the full thing for every change (that would be overkill, and it would drain limits for no reason). I run it when the blast radius is large.
A word of warning: the plugin can also enforce reviews automatically through a review gate, and that gate can spin up a long-running Claude/Codex loop that quietly burns your usage. Useful for high-risk work you are actively watching. A bad idea to leave on for everything. Automation is leverage; blind automation is just a faster way to make a mess and burn your tokens.
You don’t need to use the plugin. You could also switch between apps or you could build everything in Codex and then review with Claude Code or just copy paste the review results into the right terminal. There are different workflows. The plugin above and results above worked best for me because I did not have to switch out of Claude Code and it feels like an integrated natural part in my workflow.
What this means
If I had to compress what I have changed into something portable, it is this. Stop treating these tools as products to choose between, and start treating them as roles in a system.
The practical version is short. Let one agent build and one agent doubt, and make sure they are different models, because two instances of the same model mostly share a blind spot. Match the level of review to the risk of the change, not to a habit. And hold onto the one job that does not delegate, which is deciding, with full context, that the thing is actually correct.
I am not certain how durable any of this is. The models keep moving, the plugin is new, and a year from now the right division of labor may look different. But the underlying shift seems stable to me: as the models get better at writing code, the scarce skill stops being writing it.
Pick the best model for each job, not one model for all of them. And when two of them together beat either one alone, build the workflow that exploits it.
P.S. You don’t need to use the plugin. You could also switch between apps, build everything in Codex and then review it with Claude Code, or simply copy and paste the review results into the right terminal. There are different workflows. The plugin above worked best for me because I did not have to switch out of Claude Code. It felt like an integrated, natural part of my workflow. You should try it out, but also experiment and see what works best for you.

That’s it for today. Thanks for reading.
Enjoy this newsletter? Please forward to a friend.
See you next week, and have an epic week ahead,
- Andreas

