• Human in the Loop
  • Posts
  • #4 Edition: Inside the AI Agent Race: A dispatch from San Francisco

#4 Edition: Inside the AI Agent Race: A dispatch from San Francisco

PLUS: Major AI Agent updates from OpenAI, Atlassian & more

Hey, it’s Andreas.
And welcome back to Human in the Loop. I just landed back in Europe, super energized after a few intense days in San Francisco last week. Let’s dive straight into it!

This week’s edition covers:

  • OpenAI demos a UI-testing agent that clicks through apps like a human, Windsurf drops an AI-native browser that sees your terminal and IDE, and Starbucks quietly rolls out GenAI barista assistants across 34,000 stores.

  • What I saw in 70 hours in San Francisco — and why it’s becoming the capital of AI agents.

  • and much more…

Let’s go!

Weekly Field Notes

🧰 Industry Updates
New drops: Tools, frameworks & infra for AI agents

🌀 Windsurf drops Wave 10, launching their own AI-native browser
→ A Chromium-based browser that syncs with your IDE + terminal — so the AI sees what you see. That basically means less copy-paste and a more context-aware dev flow. Big fan of Windsurf — they’re building good stuff.

🌀 OpenAI demos UI-testing agent using CUA
→ A glimpse into autonomous QA (frontend) — AI that clicks, tests, and adapts across interfaces. It uses Playwright to spin up a browser instance and navigate to the web app to be tested.

🌀 Starbucks rolls out Green Dot Assist, a GenAI-powered in-store assistant for baristas
→ Answers ops questions in real-time via iPad — freeing up time for service, not searching manuals. This seems like a strong method, and I think other companies will start using similar approaches soon.

🌀 The Browser Company launches Dia, an AI-first browser now in closed beta
→ Ships with an integrated agent that sees every tab, adapts to your habits, and takes actions for you. Early signs of where ambient software is going.

🌀 Atlassian launches Rovo Dev CLI, an AI agent for your terminal
→ Write, refactor, debug, and generate docs — all in natural language. Pulls in Jira + Confluence context so you stay in flow from idea to deployment. Atlassian remains one of the most essential tools in enterprise software development — and it feels like they’re finally starting to catch up with AI.

🎓 Learning & Upskilling
Sharpen your edge - top free courses this week

📘 Build GenAI pipelines with Airflow 3.0
Hands-on course from Astronomer
→ Learn to turn your prototype into a production-ready workflow — with retries, scheduling, observability, and task orchestration built in. Perfect if you're scaling beyond notebooks.

P.S. Got a good new course? Send it my way on LinkedIn or just hit reply.

🌱 Mind Fuel
Strategic reads, enterprise POVs and research

🔹 McKinsey drops their first Agentic AI Playbook for CEOs
→ 80% use GenAI, but most see no impact. McKinsey’s fix? Agents — not copilots. Scaling needs new workflows, governance, and trust. This isn’t pure optimization anymore — it’s about creating a new operating model.

🔹 Google releases their “The AI Agent Handbook”
→ 10 practical hacks to use agents across research, marketing, coding, and ops. Real-world use cases + how to get started with Agentspace.

🔹 Anthropic responds to Apple — with Claude Opus as lead author
Apple said last week reasoning is just pattern-matching. Anthropic says: test harder. Claude 3.5 shines when tasks get complex — and the real issue is your benchmark.

🔹 Stanford & CMU challenge chain-of-thought prompting
→ Direct answers often beat verbose reasoning. For agents, faster and cleaner might be the smarter path.

🔹 Stanford & DeepMind explore EvoPrompt, a Darwinian method to evolve prompts without retraining
→ Agents that learn and adapt on their own — no gradients, just evolution.

♾️ Thought Loop
What I've been thinking, building, circling this week

Just came back from San Francisco.

I spent ~70 hours there — and it felt like a full sprint.
Two major conferences (AMD & Databricks), a panel with smart minds from OpenAI, Anthropic, and Salesforce, and nonstop convos around agentic AI with people from IBM, SAP, 11labs, ScaleAI, and more.

SF right now? Pure AI pressure cooker.

Every billboard? AI.
Every convo? AI.
Every launch? Agents.

If you’ve ever wondered what it looks like when the future arrives unevenly — it’s this. SF is in full cyberpunk mode.

AMD: Betting on Agentic AI

The Advancing AI event in San Jose was direct and focused — a clear look at AMD’s roadmap for the years ahead, and a pointed challenge to NVIDIA’s dominance.

CEO Lisa Su kicked things off by naming agentic systems and inference as AMD’s key growth drivers — projecting a $500B TAM by 2028. She described agentic AI as systems that will increasingly require a tight integration of GPU and CPU — and that’s exactly where AMD is positioning itself.

To support this, AMD launched its new MI350 GPU series and teased Helios, a next-gen AI rack with 10× more inference power, built for Mixture of Experts models.

Also notable: Sam Altman made a surprise appearance on stage — and called OpenAI an early design partner for AMD’s next-gen MI450 chips, expected in 2026.

Altman is clearly chasing every chip he can get — and AMD is now part of that play. He also echoed a theme I heard again and again: training is changing fast, and inference is becoming the dominant workload.

Databricks: The Agent Builder Play

Databricks just wrapped its Data + AI Summit at the Moscone Center in San Francisco — 20,000 people, 700+ sessions, and 11 big announcements:

Which announcement stood out the most to me? Agent Bricks.

What is Agent Bricks?
Agent Bricks is Databricks' new no-code framework for building high-quality, domain-specific AI agents using your enterprise data. It auto-generates evaluation benchmarks, tunes performance, and optimizes for cost and quality — so you can move from prototype to production without manual trial-and-error.

It’s no coincidence — every major provider is now rolling out their own no-code, in-house AI agent builder (Salesforce, ServiceNow, AWS etc.).

Agents are on track to become the default interface for enterprise software — and every stack wants to own the layer where they’re built, optimized, and deployed.

FYI: There were 700+ other sessions. If you are deep into Databricks ecosystem, you should have a look here. All the stuff is available for free.

Regulating the AI Agent Era?

I also joined a panel on compliance and regulation of AI agents — with leaders from OpenAI, Anthropic, Salesforce, and Schellman.

Key questions on the table:
– How do you regulate autonomous agents?
– Who’s responsible when something goes wrong?
– How do we build trust into systems that make decisions on their own?

These aren’t future problems. They’re already hitting enterprise teams in risk, compliance, and audit.

Voice as the Next Interface?

One trend that kept coming up in conversations — and still feels massively underhyped: voice as the core interface for intelligent systems.

Not just voice assistants. Voice-native agents — apps where voice isn’t a feature, it’s the default interface. The more I think about it, the more I agree.
For a lot of real-world use cases, voice will replace both keyboards and chat as the dominant interaction layer.

Final Thought

70 hours in San Francisco confirmed it: We’re no longer in the prototype phase of AI agents — we’re in deployment.

🔧 Tool Spotlight
A tool I'm testing and wachting closely this week

AMD launched Developer Cloud, giving anyone with a GitHub ID access to MI300X GPUs — no infra setup needed.

→ 1x–8x GPU configs (up to 1.5TB VRAM), free 25-hour trial, and affordable scaling after that.
→ Great for testing, fine-tuning, and building native AMD support into OSS projects.
→ Part of AMD’s open ecosystem strategy — tying hardware, ROCm 7, and Helios infra into one stack.

This is likely the most under-reported announcement from AMD AI Day 2025 — but it might play a much bigger role long term. AMD is trying very hard to build their developer ecosystem. I’ve tested it. It’s clean, usable, and finally feels like they’re running, not crawling.
→ Explore AMD Developer Cloud

That’s it for today. Thanks for reading.

Enjoy this newsletter? Please forward to a friend.

See you next week and have an epic week ahead,

— Andreas

P.S. I read every reply — if there’s something you want me to cover or share your thoughts on, just let me know!

How did you like today's edition?

Login or Subscribe to participate in polls.