30
posted ago by Narg ago by Narg +30 / -0

Just another reminder that AI doesn't work like the human brain, and that consequences can and will blindside people unexpectedly.

PDF of the paper: https://arxiv.org/pdf/2506.08184

X post by @sukh_saroy about it:

https://x.com/sukh_saroy/status/2031742412595618144?s=20

https://nitter.net/sukh_saroy/status/2031742412595618144?s=20


🚨Nobody is ready for this paper.

Every LLM you use GPT-4.1, Claude, Gemini, DeepSeek, Llama-4, Grok, Qwen has a flaw that no amount of scaling has fixed.

They cannot tell old information from new information.

A patient's blood pressure: 120 at triage. 128 ten minutes later. 125 at discharge.

"What's the latest reading?"

Any human: "125, obviously."

Every LLM, once enough updates pile up: wrong. Not sometimes wrong. 100% wrong. Zero accuracy. Complete hallucination. Every model. No exceptions.

The answer sits at the very end of the input. Right before the question. No searching needed.

The model just can't let go of the old values.

35 models tested by researchers from UVA and NYU. All 35 follow the exact same mathematical death curve. Accuracy drops log-linearly to zero as outdated information accumulates.

No plateau. No recovery. Just a straight line to total failure.

They borrowed a concept from cognitive psychology called proactive interference old memories blocking recall of new ones. In humans, this effect plateaus. Our brains learn to suppress the noise and focus on what's current.

LLMs never plateau. They decline until they break completely.

The researchers tried everything:

  • "Forget the old values"- barely moved the needle
  • Chain-of-thought- same collapse
  • Reasoning models- same collapse
  • Prompt engineering- marginal improvement at best

But here's the finding that should reshape how you think about AI infrastructure:

Resistance to this interference has zero correlation with context window length.

Zero.

It only correlates with parameter count.

Your 128K context window is not memory. It's a junk drawer that the model can't sort through.

The entire AI industry is charging you for longer context. This paper says context length was never the problem.

If you're building agents, memory systems, financial tools, healthcare pipelines, or anything that tracks changing data over time you are building on top of this flaw.

And almost nobody is talking about it.