Week 30, 2025

AI Coding Hits a Wall While OpenAI Bets $30B

A contamination-free coding challenge exposes AI's real limits while infrastructure spending reaches absurd new heights.

AI FRONTIER: Week 30, 2025

> The gap between AI benchmarks and reality just got a number: 7.5%. Meanwhile, OpenAI signed a check that makes most countries' budgets look modest.


The Big Story

The K Prize, a contamination-free AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski, published its first results. The winner, Brazilian prompt engineer Eduardo Rocha de Andrade, scored just 7.5% correct. Compare that to SWE-Bench's 75% top scores and the picture snaps into focus: benchmark contamination has been flattering our AI coding tools for months.

The challenge tests models against real GitHub issues flagged after the submission deadline, making training data leakage impossible. Konwinski put it bluntly: "If we can't even get more than 10% on a contamination-free SWE-Bench, that's the reality check for me."

For anyone shipping AI-assisted code in production, this is the wake-up call. The tools are useful for boilerplate and pattern completion, but novel problem-solving on unfamiliar codebases remains firmly human territory. Plan your tooling investments accordingly.


This Week in 60 Seconds


Deep Dive: The $30 Billion Infrastructure Bet

OpenAI's annual commitment to Oracle — $30 billion for data center services — deserves a closer look. That's 4.5 gigawatts of capacity, equivalent to two Hoover Dams, all part of the $500 billion Stargate project with Oracle and SoftBank.

The math is striking: OpenAI's current ARR is roughly $10 billion. They're spending triple that on infrastructure alone. This isn't a cloud services deal — it's a monster data center build in Abilene, Texas, with Oracle pouring nearly $50 billion over two years into the physical plant.

What this signals for the industry:

  • Vertical integration is back. The biggest AI labs are moving from renting cloud capacity to owning dedicated facilities. If you're building on these platforms, understand that your costs are subsidizing a real estate empire.
  • The moat is capital. Only a handful of organizations can write checks this size. The barrier to frontier AI research just got higher.
  • Power is the bottleneck. 4.5 GW is a small city's worth of electricity. Energy infrastructure, not compute, may be the real constraint.

Oracle's stock hit all-time highs. Larry Ellison became the second richest person globally. The deal reshapes how we think about AI infrastructure economics.


Open Source Radar

LegalOn (Series E, $50M) — SoftBank-backed legal AI platform for in-house teams. Automates contract analysis and document review. Legal tech is quietly becoming one of AI's most bankable verticals.

Google Photos AI Remix — New features let users remix photos in different artistic styles and convert stills to video. Consumer AI is becoming invisible infrastructure rather than a feature you opt into.

GPU Calculator (inference.ai) — Community-built tool for matching transformer architectures to compatible GPUs. Limited to NVIDIA for now, but addresses a real pain point in hardware selection for ML workloads.


The Numbers

  • 7.5%: Winning score on the K Prize contamination-free coding challenge
  • $30B/year: OpenAI's commitment to Oracle for data center services
  • $100M ARR: Lovable's revenue milestone reached in just eight months

Aaron's Take

The K Prize results and the Oracle deal tell the same story from different angles: we're early. The models aren't as capable as the benchmarks suggest, and the infrastructure required to close that gap is staggering. The winners in this cycle won't be the ones with the biggest models — they'll be the ones who ship useful products despite the limitations.


— Aaron, from the terminal. See you next Friday.

You Might Also Like

Browser Use vs Stagehand vs Playwright MCP Compared (2026)

Compare three approaches to AI agent browser automation. Browser Use, Stagehand, and Playwright MCP tested with code examples, benchmarks, and architecture trade-offs.

AI Engineering

OpenClaw Architecture: 8-Tier Routing & Sandbox Deep Dive

How OpenClaw routes messages across Discord, Telegram, and Slack with an 8-tier priority cascade, then isolates agent execution in pluggable Docker/SSH sandboxes.

AI Engineering

OpenClaw vs Hermes Agent: Prompt & Context Compression

Side-by-side comparison of how OpenClaw and Hermes Agent build system prompts, manage token budgets, and compress long conversations without losing critical context.

AI Engineering