"ChatGPT is getting dumber, and quickly": Researchers at Stanford and UC Berkeley found its accuracy on math problems dropped from 97.6% to 2.4% in just three months

posted 128 days ago by Narg 128 days ago by Narg +87 / -0

What does this say about putting AI ever-more-in-charge of our military?

I don't know; maybe nothing. We'll find out, sooner or later.

Sauce: https://search.brave.com/search?q=How+is+ChatGPT%27s+behavior+changing+over+time%3F&source=desktop&summary=1&conversation=08d642b3893e3fe2186361e3a221e77d4edb

https://x.com/heynavtoor/status/2031822709437640823?s=20

https://nitter.net/Jihooncrypto/status/2031673535895081238?s=20

Nav Toor @heynavtoor

🚨BREAKING: OpenAI told you every update makes ChatGPT smarter.

Stanford proved the opposite.

GPT-4's accuracy on math problems dropped from 97.6% to 2.4% in just three months. And nobody told you.

Researchers at Stanford and UC Berkeley tracked ChatGPT's actual performance over time. Same prompts. Same tasks. Different results. The model that nearly aced math questions in March was getting them wrong 97 out of 100 times by June.

Code generation collapsed too. In March, over 50% of GPT-4's code ran perfectly on the first try. By June, only 10% did. Same questions. Dramatically worse answers. Every silent update OpenAI pushed made the product you pay $20 a month for quietly worse at the things you actually use it for.

The researchers tested GPT-3.5 and GPT-4 across math, coding, medical exams, reasoning, and sensitive questions. The drift was massive and unpredictable. Some tasks improved. Others fell off a cliff. And there was no way for you to know which was which, because OpenAI never disclosed what changed.

Here's where it gets personal. If you used ChatGPT for code in March and it worked, then tried the same thing in June and it broke, you probably blamed yourself. You thought you prompted it wrong. You tried again. You wasted hours debugging your own questions. But it wasn't you. The model had silently changed underneath you.

OpenAI's VP of Product went on X and said "we haven't made GPT-4 dumber."

Stanford's data says otherwise.

97.6% to 2.4% is not a matter of opinion.

Every business building on ChatGPT's API, every student relying on it for schoolwork, every developer using it to ship code is standing on ground that shifts without warning. You trusted it yesterday. It changed overnight. Nobody told you.

You're not imagining it. ChatGPT is getting dumber. Stanford proved it.

33 comments

33 comments share save hide report block hide replies