Human in the Loop
Posts
#12 Edition: 95% of GenAI projects fail? Here’s the truth.

#12 Edition: 95% of GenAI projects fail? Here’s the truth.

PLUS: Microsoft added AI to Excel — but it’s not what you think

Andreas Horn
August 25, 2025

Hey, it’s Andreas.
Welcome back to Human in the Loop — your field guide to what just dropped in AI agents, and what’s coming next.

I am still on “vacation.” But let’s be real — there’s no pause button in AI. Here’s what happened this week:

MIT says 95% of GenAI projects fail — I’ll show you why that number is misleading.
Microsoft just slipped AI into Excel — and it’s bigger than you think.
OpenAI launches Agents.md — the first real standard for coding agents.
And more…

Let’s dive in!

Weekly Field Notes

🧰 Industry Updates
`New drops: Tools, frameworks & infra for AI agents`

🌀 Claude models now auto-end harmful conversations
→ Safety guardrails advancing in real-time. Expect this to become a benchmark expectation, not just in practice but also from a regulatory standpoint.

🌀 Elon Musk launches Macrohard to rival Microsoft
→ A new xAI venture aiming to build AI-native alternatives to Microsoft’s software suite. Plans include hundreds of specialized coding, design, and gaming agents working in concert. Could be the next big thing… or just a parody. Who knows.

🌀 NVIDIA accelerates GPT-OSS 120B
→ Nearly 2x faster on DGX B200 GPUs. Shows how infra vendors keep pace with ever-hungrier agent workloads.

🌀 Excel gets =COPILOT() function
→ AI built straight into formulas. Data cleanup, analysis, and brainstorming directly inside spreadsheets. Like it or not, Excel is still the operating system of business — and putting AI there makes it instantly accessible to everyone.

🌀 Cohere debuts Command A
→ A reasoning-focused model with controllable agents. The push is clear: reasoning + control are becoming table stakes.

🌀 Google discloses Gemini’s energy cost
→ Just 0.24 Wh per prompt. Transparent energy metrics will be a differentiator as enterprises weigh sustainability and scaling.

🌀 Grammarly launches 9 specialized AI agents
→ From editing to ideation, plus a new Docs surface. Writing assistants are fragmenting into agent ecosystems.

🌀 Claude Code rolls out to business plans
→ Admin controls + access expansion. Enterprise packaging matters as much as raw capability.

🌀 ElevenLabs adds Chat Mode
→ Voice-first giant now expands into text-only agents. Strategic widening of its base.

🌀 MongoDB Store for LangGraph
→ Scalable long-term memory for agents. Critical piece: persistent, queryable memory tied to enterprise data stacks.

🎓 Learning & Upskilling
`Sharpen your edge - top free courses this week`

📘 OpenAI Academy
→ Feels like almost nobody has caught this yet. A free, beginner-friendly platform to teach anyone how to use AI — from students to professionals. Includes simple breakdowns of how ChatGPT works, real-world use cases, prompt writing, AI ethics, and hands-on tutorials directly in ChatGPT.

📘 IBM Technology on Agents vs Mixture of Experts
→ A crisp primer on workflow design, efficiency, and real-world deployment.

📘 Anthropic’s “Prompting 101”
→ One of the sharpest real-world tutorials on prompt design. Key lessons: prompting is iterative, structure matters (context, rules, examples, step-by-step), and prompts are production interfaces, not playground notes.

📘 Anthropic on prototyping with Claude Code
→ This is a must-watch on prototyping and the use of the Claude Code SDK.

🌱 Mind Fuel
`Strategic reads, enterprise POVs and research`

🔹 Thomson Reuters on AI in legal, tax & risk
→ Shows where agents deliver value in complex, regulated industries.

🔹 OpenAI expands in India
→ New Go Plan + office. Expanding presence in one of the fastest-growing AI adoption markets. This ties directly to the perspective I shared on India last week — worth catching up on if you missed it.

🔹 Infosys on AI risk
→ Survey of 1,500 execs: 95% report AI incidents, with 13% severe enough to threaten survival. Most cause financial loss, but reputational damage is the bigger fear. RAI spend is 25% of AI budgets, yet only 2% of firms qualify as true leaders in governance and risk.

🔹 Sequoia on AI’s trillion-dollar retail play
→ VC giant analyzed eight historic retail tech shifts and sees AI as the next $2T+ opportunity.

♾️ Thought Loop
`What I've been thinking, building, circling this week`

Everyone’s been talking about it it this week: “MIT says 95% of GenAI projects fail”.

Full report can be found here.

What the report really says

MIT’s State of AI in Business 2025 (Project NANDA) looked at ~150 exec interviews, 350 employees, and 300 deployments. Only 5% of pilots showed rapid revenue impact. Most budgets were sunk into sales & marketing pilots, while the biggest measurable returns came from back-office automation — reconciliation, claims, documentation, BPO replacement. Startups succeed by picking one pain point and moving fast.

Enterprises? Less so. Purchased tools worked ~67% of the time; internal builds only ~33%. And the authors close by pointing to agentic AI as the next frontier.

That’s the baseline.

Now my take: the “95% fail” headline is oversimplified and misleading. It’s been repeated without context to spark fear and feed the “AI bubble” narrative.

I actually read the report (probably one of the few people). Here’s the reality:

→ “Fail” ≠ tech broke — it just meant no P&L impact in six months. That’s absurdly short for enterprise change.
→ Most “failures” came from Sales & Marketing pilots. Meanwhile, the boring stuff (finance ops, claims, reconciliation, docs) is already paying off.
→ The dataset is tiny — a few hundred execs, mostly U.S. corporates. Even MIT calls it “directional.”
→ And yes, I even believe that there’s an agenda: the study is tied to Project NANDA, which builds agentic AI. Of course the conclusion is: today’s GenAI fails because it lacks memory/adoption — and agents are the answer.

But the most interesting part isn’t the number. It’s the pattern. I see the same domino effect play out in almost every failing AI project:

Wrong starting question: not “Which pain point can AI solve?” but “We need to do something with AI.”
Budgets chase shiny demos, not operational levers.
DIY internal builds dominate, even though partnerships are 2x more successful.
Solutions tossed to IT, with no business ownership.
Change management = afterthought.

And here’s the key point: if you treat GenAI like a classic IT project and just chase the hype, you will fail. Not because the tech doesn’t work — but because failure is the expected outcome of the wrong approach.

The messy middle of GenAI

BUT there are two truths which stand out:

Most companies are stuck in pilot theater.
ROI comes from pragmatism, not moonshots — start with high-frequency, measurable workflows, and treat process + people as seriously as the tech.

Meanwhile, “shadow AI” — employees quietly using ChatGPT/Claude — is raising expectations faster than sanctioned tools can catch up. That gap will only widen.

So no, GenAI isn’t failing. It’s in the messy middle: pilots everywhere, integration still rare. And in my experience at IBM, once integration, governance, and process change are in place, the success rate looks nothing like 5%.

Which brings me to one last thought. Most enterprises today still approach AI like carpenters: measure, plan, perfect, then execute. That works for cutting wood — but not for AI.

The smarter approach is the gardener’s mindset: plant multiple seeds, observe what grows, nurture what works, prune what doesn’t. Don’t force maturity frameworks. Don’t chase moonshots. Build an ecosystem that learns and adapts.

Because in AI, you don’t win by drawing the straightest blueprint.
You win by growing the healthiest garden. 🌱

🔧 Tool Spotlight
`A tool I'm testing and watching closely this week`

AGENTS.md — Launched this week as a new open standard that might become the README for coding agents. Already adopted by 20k+ (!!!) open-source projects.

→ Predictable place for build/test/style rules
→ Keeps READMEs human, AGENTS.md machine-focused
→ Works across Cursor, Aider, Gemini CLI, RooCode, Jules & more

How it works:
Drop an AGENTS.md in your repo with setup commands, code style, test rules, and PR guidelines. Agents read it automatically — no more guesswork.

This is basically a package.json for AI teammates. A simple spec that makes agent-native dev real.

→ Explore AGENTS.md on GitHub

Feels like the first standard for the agent-native era.

That’s it for today. Thanks for reading.

Enjoy this newsletter? Please forward to a friend.

Want to collaborate? Drop me an email.

See you next week and have an epic week ahead,

— Andreas

P.S. I read every reply — if there’s something you want me to cover or share your thoughts on, just let me know!

#12 Edition: 95% of GenAI projects fail? Here’s the truth.

PLUS: Microsoft added AI to Excel — but it’s not what you think

Weekly Field Notes

🧰 Industry UpdatesNew drops: Tools, frameworks & infra for AI agents

🎓 Learning & UpskillingSharpen your edge - top free courses this week

🌱 Mind FuelStrategic reads, enterprise POVs and research

♾️ Thought LoopWhat I've been thinking, building, circling this week

🔧 Tool SpotlightA tool I'm testing and watching closely this week

How did you like today's edition?

🧰 Industry Updates
`New drops: Tools, frameworks & infra for AI agents`

🎓 Learning & Upskilling
`Sharpen your edge - top free courses this week`

🌱 Mind Fuel
`Strategic reads, enterprise POVs and research`

♾️ Thought Loop
`What I've been thinking, building, circling this week`

🔧 Tool Spotlight
`A tool I'm testing and watching closely this week`