• Human in the Loop
  • Posts
  • #10 Edition: Vacation Interrupted: GPT-5 Hits 700M (!) Users + First Open Weights in Years

#10 Edition: Vacation Interrupted: GPT-5 Hits 700M (!) Users + First Open Weights in Years

PLUS: Big drops this week Google and Anthropic

Hey, it’s Andreas.
Welcome back to Human in the Loop — your field guide to what just dropped in AI agents, and what’s coming next.

We just crossed 18,000 readers. Huge welcome to all the new faces — you’re in good company.

It might be summer break (I’m writing this from the beach), but the AI world isn’t slowing down. If anything, it’s heating up. Last week was one of the biggest of the year.

OpenAI reshaped its entire model lineup, launched GPT-5, and dropped its first open-weight models in years. Anthropic leveled up Claude. And Google rolled out Genie 3, along with other agent-focused tools that flew under the radar.

Let’s dive in.

Weekly Field Notes

🧰 Industry Updates
New drops: Tools, frameworks & infra for AI agents

🌀 Anthropic rolls out Claude Opus 4.1
→ Stronger long-running reasoning, better code gen, and sustained context handling. Built for multi-hour, multi-file workflows. Anthropic is steadily gaining an edge in software engineering use cases.

🌀 Google debuts Genie 3 for real-time interactive 3D worlds
It didn’t get much attention this week, but it’s worth a closer look.. Turns text into explorable game-like environments in seconds. Opens new doors for training, simulation, and immersive UX (if you haven’t seen it yet, take 5 minutes to watch the videos — worth it).

🌀 Producer AI debuts agentic music co-writer
→ Agents that co-compose, suggest arrangements, and iterate in-session. Looks like a fun use case to play around.

🌀 Google Gemini CLI GitHub Actions automate dev workflows
→ Lets agents trigger CI/CD tasks and deployments from the CLI. Brings AI into the heart of the dev toolchain.

🌀 Google Jules coding agent now public
→ Async code generation built for parallel builds and minimal hand-holding.

🌀 LangChain releases Open SWE — open-source autonomous coding agent
→ Full-stack dev agent that can plan, code, test, and iterate with little human oversight.

🌀 Google Gemini Storybook generates illustrated books from one prompt
→ Narrative + visuals in minutes. A creative showcase of agent capabilities.

🌀 ElevenLabs unveils Eleven Music
→ Studio-quality AI music creation with style control — creative agents step closer to pro production.

🌀 CrowdStrike expands AI agent security coverage
→ Protection now extends to 175+ SaaS apps, reflecting the new agent attack surface.

🌀 Manus introduces WideResearch — 100+ agents in parallel
→ Distributed research architecture for rapid-scale synthesis.

🎓 Learning & Upskilling
Sharpen your edge - top free courses this week

📘 DeepLearning.AI launches Claude Code course
→ Hands-on training for autonomous coding workflows with Claude. A great entry point: I’m a big fan of Claude Code and recommend it to anyone reading this — an excellent course to start with.

📘 IBM on AI Agents shaping storytelling & narrative design
→ Martin Keen breaks down the question: “Can AI agents redefine storytelling?”

📘 AWS Serverless Agentic Workflows with Amazon Bedrock
→ AWS short course on building and deploying scalable serverless agentic apps. Covers tool integration, code execution, guardrails, and scaling via Amazon Bedrock. Hands-on project: a customer service bot with CRM integration, secure operations, and Python code execution — all without managing infrastructure.

🌱 Mind Fuel
Strategic reads, enterprise POVs and research

🔹 Gartner puts AI agents at peak of 2025 Hype Cycle
→ Predicts near-term disillusionment but long-term transformation. Classic emerging tech curve.

🔹 OpenAI releases full GPT-5 Prompting Guide
→ Tactical cookbook for getting the most out of GPT-5. Covers role-setting, staged prompts, examples, system vs. user separation, structured outputs, reflection loops, and retrieval injection. Core message: 90% of output quality comes from prompt design — this is the real performance lever.

🌀 Anthropic publishes AI Agent deployment framework
→ Blueprint for safe, scalable rollout — covering governance, risk, and measurement.

🔹 Artificial Analysis on the AI tech landscape (Q2)
→ TL;DR: Google leads in vertical integration (TPUs to Gemini), US labs dominate proprietary reasoning models, and China holds the open-weights crown with DeepSeek, MiniMax, Alibaba, and Moonshot. Compute costs for GPT-4-level intelligence have dropped 100x since launch, but reasoning models and AI agents still drive massive compute demand. In coding agents, GitHub Copilot and Cursor stay ahead of Claude Code and Gemini Code Assist.

🔹 Airbnb CEO on AI agents
→ Says they won’t replace Google — but will significantly improve service efficiency. Focus is augmentation, not search disruption.

♾️ Thought Loop
What I've been thinking, building, circling this week

Last week we saw two big moves from OpenAI.

Move 1: GPT-5 — the most capable ChatGPT yet, with a unified architecture and stronger reasoning muscle — rolling out instantly to all 700 million users worldwide.
Move 2: Two new open-source models — small, fast, and fine-tunable, aimed squarely at the developer community.

GPT-5: A Cleaner, Smarter Stack?

Overnight, OpenAI killed five models — GPT-4o, o3, o4-mini, GPT-4.1, GPT-4.5 — replacing them with just three:

  • gpt-5-main — fast, efficient, default for ~80% of queries

  • gpt-5-thinking — deep reasoning for complex tasks

  • gpt-5-pro — research-grade intelligence for power users

A real-time router now decides automatically when to call the “big brain,” making the switch seamless and invisible to the user. No toggles, no guesswork — just better answers.

OpenAI’s new unified user menu

Personally, I prefer this cleaner setup. It’s frictionless, more natural in reasoning flow, and perfect for the majority of ChatGPT’s massive user base. I’ve lost count of how often I’ve been asked which model to pick — for casual users, the choice was friction. Now it’s gone. And the reasoning feels more naturally integrated, often leading to more precise results.

However, the launch wasn’t flawless:

  • Launch issues: glitches, low rate limits, and the now-infamous “chart crime” (see screenshot below).

  • Autoswitcher crash: made GPT-5 appear weaker than it is on day one.

  • User backlash: Reddit flooded with calls to restore GPT-4o for its personality and emotional tone.

  • OpenAI’s response: Altman admitted they misjudged 4o’s value, promising to bring it back for paid users while refining GPT-5.

GPT-5 was supposed to be a world-changing step up, but ended up in an emergency AMA with Sam Altman on Reddit.

GPT-5 was pitched as a leap forward — and on paper, it is. But this launch is a clear reminder that perception matters as much as performance. It showed that users care deeply about how changes are rolled out, not just what’s in them — and that personality and familiarity in AI models are becoming critical (who would have thought that a year ago?).

Even if parts of the AI community dismissed it as a flop, I believe it will net out as a win for OpenAI. It’s without a doubt an iterative upgrade over GPT-4o, with a strong focus on advancing software engineering capabilities/better integrated reasoning capabilities — taking direct aim at Anthropic’s dominance.

The experience is now simpler for the average user — the vast majority of the 700 million — instantly in their hands worldwide, with sharper voice, stronger coding skills, and more natural reasoning. With tighter rate limits in place, many will get a taste of that power — and I suspect plenty will pay to unlock the rest.

And then… OpenAI went open-weight?

Almost under the radar, OpenAI made its first open-weight release since GPT-2 in 2019: GPT-OSS-20B and GPT-OSS-120B. And while this isn’t full open source — despite what some reports suggest — it does mean you can run and fine-tune the models yourself, just without access to the original training data or code.

  • 20B parameters, Apache 2.0 license.

  • On par with o3-mini in reasoning benchmarks.

  • Runs locally on consumer hardware with ≥16GB VRAM or unified memory (e.g., high-end laptops, Apple Silicon Macs).

  • Ideal for everyday developers — fast setup, low hardware barrier.

  • 120B parameters, Apache 2.0 license.

  • Matches or beats o4-mini in reasoning tasks.

  • Needs serious hardware — ≥60GB VRAM or unified memory (multi-GPU setups or beefy workstations).

  • Suited for enterprise workloads, research, and heavy customization.

Benchmarks look great on paper. And in practice?

The 20B ran fine on my MacBook Pro 32GB and was lightning fast for basic or zero-shot prompts. But for more complex tasks it feels rather useless. I tried some coding — my main use case right now — and it flopped hard, failing simple tests and getting stuck on “political” issues. It doesn’t feel like what the benchmarks promise.

For me, it feels overhyped — likely tuned to shine in benchmarks, but not yet a serious alternative for production work.

P.S.: If you want to try it yourself, the easiest way is via Ollama using OpenAI’s Cookbook guide. A few commands and you’ll have it running locally — setup takes no more than 10 minutes, even if you have zero technical experience.

🔧 Tool Spotlight
A tool I'm testing and watching closely this week

Kombai AI — an AI agent purpose-built for frontend development.

I’m deep in testing every vibe coding tool I can find (32 so far — full write-up coming soon). Last week I came across Kombai, and it stood out immediately.

Key capabilities:
→ Best-in-class Figma-to-code accuracy
→ Repo-aware: reuses your components correctly
→ Supports 30+ frontend frameworks (Next.js, Vite, MUI, Chakra…)
→ Enterprise-ready: SOC 2 certified, custom context for complex stacks

Highly specialized on more complex frontend work. It will be interesting to see if specialized agents are starting to outperform general-purpose vibe coding agents. But this one is definitely worth checking out if you’re into vibe coding and frontend development.

That’s it for today. Thanks for reading.

Enjoy this newsletter? Please forward to a friend.

Want to collaborate? Drop me an email.

See you next week and have an epic week ahead,

— Andreas

P.S. I read every reply — if there’s something you want me to cover or share your thoughts on, just let me know!

How did you like today's edition?

Login or Subscribe to participate in polls.