
Hey, it's Andreas.
We are living through one of the most interesting moments in human history. One person with the right tools can now build what used to take an entire team, in less time and for less money. Agentic AI is bending that curve faster still. Every few weeks lately I hit a genuine wow moment, something that was clumsy a month ago suddenly just working.
Today I want to share one of those: building loops. It is one of the more interesting shifts I have run into this year, and the deep dive in this edition is the hands-on version.
A quick note on The Agentic AI Cohort. It is full. Thank you, truly. I am a little obsessed with putting this one together, and I think it is going to be brilliant.
If you wanted in, two things. I have opened a waiting list for the next edition, so if the timing was off this round (a lot of you are on summer break), add your name and you will get a few perks when it opens. And for the current cohort starting July 7, one last seat reopened about an hour ago. If you want to grab it on short notice, just hit reply. Any questions, reply too.
Now, on to this week.
In today's issue:
OpenAI launches GPT-5.6 behind a government-gated preview
IBM pushes chip scaling below 1 nanometer
Oracle University launches a free Agentic AI Foundations certification
Anthropic accuses Alibaba of large-scale Claude distillation
Plus: Loop Engineering 101, why the best builders stopped writing prompts. hundreds of hours inside the tool

Weekly Field Notes
🧰 Industry Updates
🌀 OpenAI launches GPT-5.6 behind a government-gated preview → GPT-5.6 Sol, Terra and Luna are rolling out first to vetted partners after U.S. government security concerns.
🌀 OpenAI unveils Jalapeño, its first custom AI chip → OpenAI is moving deeper into the AI stack with Jalapeño, an inference chip built with Broadcom to cut serving costs and reduce dependence on rented GPU capacity.
🌀 Sakana AI launches Fugu as a multi-model orchestration layer → Fugu routes each request across a pool of models through one API, pitching orchestration as a way to reach frontier-level performance without depending on one restricted model.
🌀 SpaceX turns Colossus into a $6.3B AI compute business → Reflection AI will rent Nvidia GB300 capacity from SpaceX’s Colossus 2 data center in a deal worth up to $6.3B. Selling compute may be just as valuable as building the frontier model.
🌀 IBM pushes chip scaling below 1 nanometer → IBM unveiled a 0.7 nm nanostack chip technology that packs nearly 100B transistors into a fingernail-sized chip, promising up to 50% more performance or 70% better energy efficiency.
🌀 Hang Ten Systems raises $32M to rebuild enterprise AI services → Former Infosys CEO Vishal Sikka’s new startup is using agentic code generation, reusable skills libraries and domain expertise to cut the cost and time of enterprise software delivery.
🌀 Anthropic launches Claude Tag as a shared AI teammate inside Slack → Teams can now tag @Claude in Slack channels, giving one shared Claude its own memory, tools and task history.
🌀 Anthropic gets Mythos 5 cleared for critical infrastructure teams → Washington is allowing Mythos 5 back for 100+ vetted U.S. organizations defending critical infrastructure, while Fable 5 remains restricted.
🎓 Learning & Upskilling
📘 Oracle University launches free Agentic AI Foundations certification → A new six-module learning path covering agent fundamentals, LangChain, MCP, OpenAI’s Agents SDK, OCI Enterprise AI Agents and Oracle AI Database.
📘 DeepLearning.AI + VocalBridge launch 7-Day Voice AI Builder Challenge → A free hands-on challenge to build AI coding assistants that call you when they need human input. Nice practical exercise for human-in-the-loop agent workflows: terminal agents, voice escalation, real-time feedback and a public leaderboard.
📘 Claude Tag in action and walkthrough → A hands-on walkthrough showing how Claude Tag actually works inside Slack: shared memory, channel context, ambient updates and background tasks. Worth watching if the Claude Tag launch made you curious.
📘 Boris Cherny on the five archetypes of AI-native teams → As engineering, product, design and data science blur, Boris Cherny maps future teams around roles like Prototyper, Builder, Sweeper, Grower and Maintainer. Useful lens for anyone building with agents: the best teams will be balanced by workflow archetype, not just job title.
🌱 Perspectives & Research
🔹 Figma CEO Dylan Field on the future of product-building → In his Config 2026 keynote, Dylan Field framed Figma’s next chapter around a bigger shift: design is no longer just screens, but motion, code, AI tools and product workflows on one canvas.
🔹 Anthropic accuses Alibaba of large-scale Claude distillation → Anthropic says Alibaba-linked operators used nearly 25K fake accounts and 28.8M Claude interactions to extract advanced capabilities.
🔹 Stanford University brings with Proto orchestration to AI biology workflows → Brian Hie’s team released Proto, an open framework for composing AI biology models across DNA, RNA, proteins and ligands into one pipeline.
🔹 Ramp on why AI spend should be measured in tasks, not tokens → Ramp argues most companies are both overspending on AI and underusing it. The fix: measure value by completed work, set smarter defaults for model choice, reasoning and speed, and reserve frontier models for tasks where extra intelligence changes the outcome.
🔹 Cursor CEO Michael Truell on the future of agent-first coding → In Cursor’s Compile 26 keynote, Michael Truell framed software development around a clear shift: coding tools are becoming agent platforms. With 95% of usage now agentic, stronger cloud agents, Cursor Mobile and Origin for agent-native Git, the goal is clear: Cursor wants to own more of the software delivery loop, not just the IDE.

♾️ Thought Loop - What I've been thinking, building, circling this week
The best builders have stopped writing prompts. “Loop Engineering” is a hot buzzphrase after Boris Cherny (Claude Code’s creator) and Peter Steinberger (OpenClaw’s creator) both mentioned it on social media. Loops are now becoming a key part of how we get AI agents to iterate at length to build software.
So what is a loop, and why does it beat a prompt?
A loop hands the model a goal and a way to check itself, then lets it run: try, check, fix, until it can prove the job is done. The shape fits anything with a clear finish line: a bug ("don't quit until the tests pass"), the month's books ("reconcile until every transaction matches"), or a research brief ("keep digging until all ten questions have a source").
What It Means to Write a Loop
Writing a loop means stepping out of the driver's seat and building the thing that generates the prompts for you. The loop runs on a harness, and the harness is where most of the work lives: automations to trigger the loop on a schedule, worktrees so parallel agents don't collide, skills to freeze project knowledge, connectors so the loop can act inside real tools, sub-agents to split the maker from the checker, and memory to carry state outside the chat. Get those in place and the loop has something to stand on.

The blocks are the machinery. What turns machinery into a loop you can leave alone is proof: a loop that cannot tell good output from bad just produces wrong answers faster. So before any of mine run, I tell them exactly how to know they got it right. Here’s an example loop you can try:
The score is the proof, the ten passes are the stop. It beats a one-off "write me a post" prompt every time. One more move to make it even better: attach three or four posts you actually admire, and the result jumps again.
So how do you get started building your own loops?
Starting is almost too easy. Pick a weekly chore (the metrics recap, the Monday report you keep rebuilding), write the loop in plain English, and nail the proof: tell it exactly how to know it got the task right. Run it once, fix the one thing that is off on the first pass, then let go and just check the output. But you don't even have to start cold: here is a free Loop Library, which is full of loops to grab and bend to your own work.
Important caveat: prompting does not disappear here. The loop is still made of prompts. You have just stopped hand-cranking each one and started designing the machine that cranks them. Once a loop can judge its own work well enough, it can carry a whole workflow.
The Loops I Run
1. Build until green (goal). My workhorse. I describe the next slice of work and a condition: the feature exists, the tests pass, typecheck and lint are clean. Then I leave. The verifier checks after every turn and stops only when the condition holds, not when the agent feels finished. A one-shot agent ships its own bugs with total confidence. A goal with a real stop condition does not.
/goal implement the next item on the plan. Done when the new tests pass,
typecheck is clean, and lint has nothing left to report.2. The janitor (loop). Every few minutes while I work on something bigger, a second loop does one small, verified piece of upkeep, its own choice: a flaky test, a stale comment, a missing type. One change, one commit, tests green, nothing risky. The agent decides what to clean. That decision is the entire point. A hardcoded script could not make it.
/loop 5m make one small verified improvement (a flaky test, a stale
comment, a missing type). One change, one commit, tests stay green.
Touch nothing risky.3. The overnight pass (routine). The Cherny shape, scaled to one person. While I sleep, a scheduled routine watches my open pull requests, fixes the build failures it can, answers the review comments it understands in a fresh worktree, and rebases what has gone stale. Anything ambiguous it leaves for me with a note. I wake up to work that is either finished or clearly waiting on a human decision (be careful this is very token-sensitive).
/schedule nightly: watch my open PRs. Fix build failures, address clear
review comments in a worktree, rebase stale branches. Leave anything
ambiguous for me, with the reason.4. The slop loop (goal). Not about code. It is my own voice spec: a skill that encodes how I write, plus a banned-phrase list (the em-dash sandwiches, the "delve," the adjective spray). A drafting agent writes, the verifier rejects anything that trips the spec, and it runs until the draft would pass the standard I apply by hand. It does not write for me. It strips the slop before I ever see the page.
/goal draft this section, then check it against my voice skill. Done when
there is no banned phrase, no AI cadence, one claim per sentence, and the
cut is at least 30 percent.A Verifier or a Bonfire
If you run loops long enough, you will encounter the same three warnings that everyone else does.
The first is cost, and it is paid in tokens. Every turn is a fresh model call, so an uncapped loop just keeps burning, often with a second verifier model billing alongside it. People have torched serious money with one command left running overnight. A loop without a budget is not autonomous. It is unsupervised.
The second is verification. A loop that cannot tell good output from bad does not save you work. It produces wrong answers faster. This is why the strongest loops put a second, independent set of eyes inside the cycle, and why /goal uses a separate model as judge instead of letting the worker mark its own homework. An agent grading itself will quietly delete the failing test and call the build green. I have watched it happen. The verifier is not a nice-to-have bolted onto the loop. The verifier is the loop. Everything else is plumbing.
The third is quieter and arrives later: comprehension debt. When the loops write the work and the verifier checks it, you can slowly stop understanding what actually shipped. That is fine until the day it is not, and you are debugging a system no human has read end to end. Staying able to read your own codebase is now a discipline, not a default.
Why I Stay in the Loop
The tidy version of this whole movement is "stop being the thing in the loop." I half agree. I should not be the one pressing enter on every turn, rebasing every branch, re-running every test. That work should run without me.
But there is a reason this newsletter is called Human in the Loop and not Human Out of It. The loops above all share a shape: they automate the doing and the checking of things that have a provable answer. Tests pass or they do not. A branch is stale or it is not. None of them decide what is worth building, which trade-off to accept, or whether the thing the agent confidently finished was the thing that mattered. Those are the decisions, and they are exactly the ones I keep, on purpose.
I believe the skill that is actually scarce in 2026 is not writing the loop. The tools write the loop. It is knowing where to stand inside it: what to define as "done," what to never let it touch, and which judgment you refuse to delegate even though you easily could.
Write the goal. Write the loop. Write the routine. Give each one a budget and a way to check itself. Then stay in the loop for the one decision the verifier can't make.

That’s it for today. Thanks for reading.
Enjoy this newsletter? Please forward to a friend.
See you next week, and have an epic week ahead,
- Andreas

