Week 4, 2026

100 Fake Citations Passed NeurIPS Peer Review

GPTZero finds AI hallucinations in 53 accepted papers, eBay bans shopping agents, and Ghostty writes the AI policy everyone needed.

AI FRONTIER: Week 4, 2026

> Peer review was designed to catch bad science, not fake citations generated by an LLM that's better at sounding plausible than being correct.


The Big Story

GPTZero scanned 4,841 of 5,290 papers accepted to NeurIPS 2025 and found over 100 AI-hallucinated citations across 53 papers (889 HN points, 472 comments). These aren't typos. They're fabricated references — invented authors, fake DOIs, real arXiv IDs linked to wrong papers — that passed multiple rounds of expert review at a 24.52% acceptance rate.

GPTZero calls this "vibe citing": LLMs generating citations that combine real sources into plausible-looking altered versions. NeurIPS submissions have grown 220% since 2020 (from 9,467 to 21,575), creating review strain that AI-generated fabrication exploits. Each affected paper beat roughly 15,000 rejected submissions while standing on fabricated research foundations.

The damage compounds. Papers citing fabricated work create citation chains built on nothing. Downstream researchers implement techniques based on false premises. The integrity of AI research — published at AI's own premier venue — is undermined by the very technology it studies.

For practitioners: verify cited prior work before implementing techniques from recent publications. The era of trusting citations on face value is over.


This Week in 60 Seconds


Deep Dive: Ghostty's AI Policy — A Template Worth Copying

The Ghostty terminal emulator published an AI governance policy (341 HN points, 163 comments) that every open-source project should read. It's not anti-AI — the maintainers use AI themselves. It's anti-low-quality-AI-contributions.

The rules: all AI usage must be disclosed. AI-generated PRs only accepted for pre-approved issues. Drive-by PRs referencing unapproved issues get auto-closed. Undisclosed AI work triggers rejection. All AI-generated code must pass human verification. AI cannot write code for platforms the developer can't manually test. AI-generated media (art, images, video, audio) is banned entirely.

Enforcement has teeth: repeated violations result in public banning. The maintainers are blunt — "most drivers of AI are just not good enough" — creating a review burden that consumes more time than the contributions save.

This is the right framework for quality-conscious projects. AI assistance is productive when supervised by someone who understands the code. Without supervision, it generates plausible-looking contributions that shift debugging cost to maintainers. Ghostty's policy forces contributors to own that verification cost rather than externalizing it.


Open Source Radar

Qwen3-TTS — Alibaba's open-source voice family: design new voices, clone existing ones, generate speech. Unified toolkit replacing three separate systems. 659 HN points.

obra/superpowers — Now at 9,600+ GitHub stars. Agentic skills framework gaining serious traction for autonomous multi-step development tasks.

AionUi — 5,193 GitHub stars. Unified local interface for Gemini CLI, Claude Code, Codex, and others. Solves the fragmented multi-assistant problem.


The Numbers

  • 100+ hallucinated citations: Found across 53 NeurIPS 2025 papers by GPTZero
  • 220%: Increase in NeurIPS submissions since 2020 (9,467 to 21,575)
  • $1B: LiveKit valuation — voice AI infrastructure reaches unicorn status

Aaron's Take

The NeurIPS finding hits differently because it's AI research undermined by AI tools. We're using LLMs to write papers about LLMs, and the LLMs are making up the references. If the ML community can't maintain integrity in its own publications, we have a credibility problem that no benchmark score can fix. Citation verification needs to be automated and mandatory — not optional.


— Aaron, from the terminal. See you next Friday.

You Might Also Like

Browser Use vs Stagehand vs Playwright MCP Compared (2026)

Compare three approaches to AI agent browser automation. Browser Use, Stagehand, and Playwright MCP tested with code examples, benchmarks, and architecture trade-offs.

AI Engineering

OpenClaw Architecture: 8-Tier Routing & Sandbox Deep Dive

How OpenClaw routes messages across Discord, Telegram, and Slack with an 8-tier priority cascade, then isolates agent execution in pluggable Docker/SSH sandboxes.

AI Engineering

OpenClaw vs Hermes Agent: Prompt & Context Compression

Side-by-side comparison of how OpenClaw and Hermes Agent build system prompts, manage token budgets, and compress long conversations without losing critical context.

AI Engineering