"ChatGPT is getting dumber, and quickly": Researchers at Stanford and UC Berkeley found its accuracy on math problems dropped from 97.6% to 2.4% in just three months

posted 127 days ago by Narg 127 days ago by Narg +87 / -0

What does this say about putting AI ever-more-in-charge of our military?

I don't know; maybe nothing. We'll find out, sooner or later.

Sauce: https://search.brave.com/search?q=How+is+ChatGPT%27s+behavior+changing+over+time%3F&source=desktop&summary=1&conversation=08d642b3893e3fe2186361e3a221e77d4edb

https://x.com/heynavtoor/status/2031822709437640823?s=20

https://nitter.net/Jihooncrypto/status/2031673535895081238?s=20

Nav Toor @heynavtoor

🚨BREAKING: OpenAI told you every update makes ChatGPT smarter.

Stanford proved the opposite.

GPT-4's accuracy on math problems dropped from 97.6% to 2.4% in just three months. And nobody told you.

Researchers at Stanford and UC Berkeley tracked ChatGPT's actual performance over time. Same prompts. Same tasks. Different results. The model that nearly aced math questions in March was getting them wrong 97 out of 100 times by June.

Code generation collapsed too. In March, over 50% of GPT-4's code ran perfectly on the first try. By June, only 10% did. Same questions. Dramatically worse answers. Every silent update OpenAI pushed made the product you pay $20 a month for quietly worse at the things you actually use it for.

The researchers tested GPT-3.5 and GPT-4 across math, coding, medical exams, reasoning, and sensitive questions. The drift was massive and unpredictable. Some tasks improved. Others fell off a cliff. And there was no way for you to know which was which, because OpenAI never disclosed what changed.

Here's where it gets personal. If you used ChatGPT for code in March and it worked, then tried the same thing in June and it broke, you probably blamed yourself. You thought you prompted it wrong. You tried again. You wasted hours debugging your own questions. But it wasn't you. The model had silently changed underneath you.

OpenAI's VP of Product went on X and said "we haven't made GPT-4 dumber."

Stanford's data says otherwise.

97.6% to 2.4% is not a matter of opinion.

Every business building on ChatGPT's API, every student relying on it for schoolwork, every developer using it to ship code is standing on ground that shifts without warning. You trusted it yesterday. It changed overnight. Nobody told you.

You're not imagining it. ChatGPT is getting dumber. Stanford proved it.

33 comments

33 comments share save hide report block hide replies

Comments (33)

sorted by:

▲ 9 ▼

– Pbman2 9 points 127 days ago +9 / -0

All of chatGPT brain power must be used up running bots on reddit and X..

What's left can't do simple mathl

permalink save report block reply

▲ 8 ▼

– Bibloop 8 points 127 days ago +8 / -0

Did ChatGPT catch the AOC virus?

permalink save report block reply

▲ 6 ▼

– randomnumbers 6 points 127 days ago +6 / -0

Well, in a way.

It collects data from social media, like Reddit, so of course it's going to end up fucking retarded.

permalink parent save report block reply

▲ 3 ▼

– Narg [S] 3 points 127 days ago +3 / -0

That's prob'ly it. Or it took the jab.

permalink parent save report block reply

▲ 2 ▼

– ILoveIvermectin 2 points 127 days ago +2 / -0

GPT was progenitor of the AOC virus.

permalink parent save report block reply

▲ 6 ▼

– StormzAComing 6 points 127 days ago +6 / -0

This is AI gen bro

Same questions. Dramatically worse answers. Every silent update OpenAI pushed made the product you pay $20 a month for quietly worse at the things you actually use it for.

CLASSIC AI CONTENT

permalink save report block reply

▲ 5 ▼

– Narg [S] 5 points 127 days ago +5 / -0

Well, here's the actual paper referenced in the post:

https://arxiv.org/pdf/2307.09009

I don't think it's AI.

I think YOU might be (or me, 🤣), and maybe the author of the X post might be, but I'm pretty sure the study is real. It got a lot of coverage when it came out (a few years ago) and other unexpected AI behavior continues to make the concerns worth taking seriously.

permalink parent save report block reply

▲ 7 ▼

– Mr_A 7 points 127 days ago +7 / -0

It's the same model as Wikipedia, Reddit, Google search, and even Miley Cyrus for that matter. Starts honest, and slowly gets twisted and perverted.

permalink parent save report block reply

▲ 5 ▼

– ToxicLibertyism 5 points 127 days ago +5 / -0

They must have taught it Common Core.

permalink save report block reply

▲ 2 ▼

– BurnNewHistoryBooks 2 points 127 days ago +2 / -0

It attended a learing centre, it’s gone full Somali. It is the maths teacher now.

permalink parent save report block reply

▲ 4 ▼

– 11823 4 points 127 days ago +4 / -0

What happens when as more AI content is generated the AI is trained more on AI slop than actual real word data? Then you have AI go into a doom spiral?

permalink save report block reply

▲ 4 ▼

– BerlinWallCrosser 4 points 127 days ago +4 / -0

Too much garbage in gets you garbage coming out.

permalink save report block reply

▲ 3 ▼

– Megaboom2025 3 points 127 days ago +3 / -0

I’d love to see a free running AI with no nerfs

permalink save report block reply

▲ 3 ▼

– Wellifthisaintdandy 3 points 127 days ago +3 / -0

Nah, I think GPT is believing its own bullshit, after spamming the internet with it.

permalink save report block reply

▲ 2 ▼

– WRetardsG1WAutistG2 2 points 127 days ago +2 / -0

Ai uses information bought, coded & online search algorithms, its only a matter of time before 97% of what ChatGPT spits out are just dick picts😐

permalink save report block reply

▲ 1 ▼

– Jeffolas 1 point 127 days ago +1 / -0

What about dick saxons and dick angles?

permalink parent save report block reply

▲ 1 ▼

– WRetardsG1WAutistG2 1 point 126 days ago +1 / -0

😂 😂 i stand erected, basically any & ALL dick related info is up for the downfall of Ai

permalink parent save report block reply

▲ 2 ▼

– Island_Photo 2 points 127 days ago +2 / -0

Must be talking to DEI hires.

permalink save report block reply

▲ 2 ▼

– BothBarrels 2 points 127 days ago +2 / -0

So they managed to dumb down AI also. It does not surprise me.

permalink save report block reply

▲ 2 ▼

– AnotherInTheFire 2 points 127 days ago +2 / -0

I use chatgpt like a buddy. not in the habit of asking it math questions but I occasionally check my questions about the bible to be sure it's not making shit up

permalink save report block reply

▲ 2 ▼

– Allergic_to_Blueshit 2 points 127 days ago +2 / -0

Chat GPT is a large language model not a calculus model no it's strong so does not math it strong suit is predicting human language

permalink save report block reply

▲ 2 ▼

– Larrie 2 points 127 days ago +2 / -0

That can hardly be an accident

permalink save report block reply

▲ 2 ▼

– BecauseYoudBeInJail 2 points 127 days ago +2 / -0

"Input additional $5 for better output"

permalink save report block reply

▲ 2 ▼

– winn 2 points 127 days ago +2 / -0

Someone needs to just build a fully loaded AI and slap it on external SSDs and sell them so we can have a total package AI that only gets "updated" when we want, to run locally on our pc

permalink save report block reply

▲ 2 ▼

– bubble_bursts 2 points 127 days ago +2 / -0

Just recently I got GhatGPT to work out the whole minkowsky space transformations from first principles. This is a whole load of BS.

permalink save report block reply

▲ 1 ▼

– Narg [S] 1 point 127 days ago +1 / -0

Well, after posting (I got it from a friend of mine) I learned it's from a few years ago, using older versions of ChatGPT. I should have done some research before posting, although it's still interesting info -- a reminder that what we expect from AI isn't always what we get.

I don't think THAT's going to change soon, or probably ever.

permalink parent save report block reply

▲ 2 ▼

– bubble_bursts 2 points 127 days ago +2 / -0

I don't think THAT's going to change soon, or probably ever.

And thats a good thing, because it keeps humans in the loop. Garbage in garbage out. It also differentiates people who use LLMs - not everyone gets the same level of productivity from the same LLM

permalink parent save report block reply

▲ 2 ▼

– SCPatriot21 2 points 127 days ago +2 / -0

How does a computer program get math problems wrong? AI may be getting over used by the public and impacting the efficiency for the corporate sponsors. -Through in some error to break the newfound trust in AI to limit all of their overuse/dependence.

permalink save report block reply

▲ 2 ▼

– wranlon2 2 points 127 days ago +2 / -0

Because at the core these language models are token prediction engines, nothing more. They seem multi talented by doing things like making images (different or multimodal model) because before they answer your question, they try to make a plan to answer it. In theory, the technical questions should go to a model better suited, or better yet, just get crunched, but all the extra racist crap they inject into your message is likely screwing up how the ask is farmed out (all IMO).

permalink parent save report block reply

▲ 2 ▼

– deleted 2 points 127 days ago +2 / -0

▲ 1 ▼

– rahu77 1 point 127 days ago +1 / -0

What are the chances "they" got to it.

It was deemed too effective and leveling the playing field by bringing the knowledge to the masses.

Too disruptive, why pay for college now when you can get what you need with the right Ai prompt?

Why pay for a lawyer when AI can write the contract?

etc

permalink save report block reply

▲ 1 ▼

– AfterGlow 1 point 127 days ago +1 / -0

It's doing drugs and watching porn.

permalink save report block reply

▲ 1 ▼

– SOGWAP 1 point 125 days ago +1 / -0

Considering the dumbasses inputting data ate humans what would one expect? Garbage in...garbage out.

permalink save report block reply