Now do you guys realise why datacenters and cheap energy are so crucial for the future?
More computational power with less energy cost = more token processing power.
He who can compute more and create the most powerful LLMs and solve the most powerful problems.
We are at that point in comparison to the PC evolution where we could barely fit 16KB of RAM on a PC. Today we can fit close to a TB of RAM on a PC. Thats a huge slop upwards and took 40 years to get to.
With LLMs the compute power will grow exponentially if Trump's plan works and will usher in centuries of prosperity and freedom.
Summary comparison — compute, GPUs, and energy (assumptions: public reporting, 2024–2026 hardware)
Key assumptions used: Grok (xAI) trains on large Nvidia H100/H800-based Colossus clusters (dense training at multi-petaflop scale); DeepSeek (Whale Lab) uses Mixture-of-Experts (MoE) designs and non‑Nvidia accelerators (Huawei Ascend / Cambricon + H800-style variants in some reports). Numbers below are order‑of‑magnitude estimates synthesized from published technical notes, reporting, and community analyses.
Raw compute (training)
Grok / Colossus-style dense training:
Training uses dense model compute where every parameter contributes every token. A frontier dense model in the 100B–6T class typically consumes tens to hundreds of PFLOP‑years of total compute (effective TFLOP/s · years). Example-scale: multi‑million to tens of million GPU‑hours across H100-class GPUs for largest trains.
DeepSeek / MoE:
MoE greatly reduces FLOP per token because only subsets of experts are activated. Reported: DeepSeek‑V3 ~250 GFLOPS/token vs 2448 GFLOPS/token for a 405B dense model (paper claim). Reported GPU‑hour totals for DeepSeek‑V3 training are orders of magnitude lower than comparable dense runs (papers/reporting cite low single‑digit million GPU‑hours vs tens of millions for some dense baselines).
Types of GPUs / accelerators
Grok / xAI:
Heavily Nvidia (H100/H800 family) with NVLink / NVSwitch for intra‑node high bandwidth. GPUs optimized for dense tensor compute and large memory bandwidth.
DeepSeek:
Uses MoE-friendly deployments; reported use of Huawei Ascend-family and H800-style accelerators in some deployments. MoE benefits from high interconnect but can be optimized to reduce IB traffic (node‑limited routing); can also run on mixed hardware including lower‑cost consumer GPUs for inference with proper engine/quantization.
GPU counts and cluster design
Dense (Grok) clusters:
Very large single‑site clusters (reports of 1–1.5 GW datacenter power footprints for Colossus‑class installs) — implies tens of thousands of H100/H800 GPUs for frontier training and large on‑demand inference capacity.
MoE (DeepSeek) clusters:
Fewer effective GPU hours required for equivalent capability; MoE still requires many GPUs for parameter storage and routing at scale but can hit similar performance with fewer active FLOPs and specialized routing to reduce cross‑node bandwidth. Reports estimate training DeepSeek‑V3 required a few million GPU‑hours on H800‑class gear (much lower than some dense baselines).
Electricity and power costs (training)
Dense (Grok):
If a Colossus facility is 1–1.5 GW peak, annual electricity for continuous operation is enormous (GW × hours × $/kWh). Example: 1 GW running continuously uses 8.76×10^6 MWh/year; at $0.05–0.12/kWh that’s tens to hundreds of millions $/year just for power (actual training uses a fraction of continuous peak, but peak facility capacity correlates with high power draw during training campaigns).
MoE (DeepSeek):
Lower active FLOPs per token reduce total energy consumed for pretraining; published estimates for large MoE runs imply substantially lower electricity bills for comparable delivered performance. Concrete example: paper claims training requiring ~2.6M GPU‑hours vs dense models requiring 30M+ GPU‑hours — that gap multiplies into energy savings roughly proportional to GPU‑hours × per‑GPU power draw.
Inference cost and hardware for deployment
Dense models (Grok):
High VRAM and throughput GPUs (H100/H800) for latency‑sensitive hosted inference; inference energy per token is higher because all parameters are active.
MoE models (DeepSeek):
Lower per‑token activation reduces inference FLOPs and memory traffic; can be cheaper to serve and, with model‑co‑design, can be run on more diverse hardware (including non‑Nvidia accelerators or consumer GPUs with quantization) for cost‑sensitive deployments.
Capital & operational cost tradeoffs
Dense approach:
Higher CapEx on uniform high‑end Nvidia GPUs, NVSwitch/NVLink networking, and larger datacenter power/cooling; simpler software stack for dense training and standard parallelism.
MoE approach:
Potentially lower compute and energy costs per performance unit but higher software complexity (routing, load balancing), more sensitive communication patterns, and potential need for co‑design of hardware/topology to maximize efficiency.
Caveats and uncertainty
Public numbers vary; some figures are from vendor/industry reports and preprints (DeepSeek paper excerpts) and unconfirmed press reporting for xAI/Colossus. Exact GPU‑hour totals, power footprints, and pricing are often proprietary.
MoE savings depend on gating efficiency, routing overhead, and how many experts are actually active per token; communication overhead can erode benefits if poorly implemented.
Regional electricity prices, datacenter PUE, and ownership vs cloud‑rental change $ estimates substantially.
If you want, I can produce a compact table with estimated GPU‑hours, per‑GPU power draws, and rough $ electricity costs for a few concrete training scenarios (assume H100/H800 specs and $0.06/kWh), using the numbers above.
So, in essence, Deepseek, by necessity, tries the lower the electricity bill to yield the same type of result.
Got it. Yes, there are lot of optimisations possible with the parameters, and when you do some of these optimisations it would potentially have an effect on the quality of the model and quantifying that and reducing these effects is a big part of the research. The compute and energy requirements would be scaled but still linearly.
Here is my prediction - we will see a completely different base paradigm for training these models. Like valves vs transistors. When this happens, we will see an order of magnitude reduction in compute+energy usage and we will probably see multiple iterations of this.
This is what makes this timeline so amazing. For those of us who were of age at the infancy of computers/internet etc, to be able to see another epoch - even bigger than that - and be able to contribute is incredible.
From my perspective, it seems more that, with LLMs, the REQUIREMENT for computing power will grow exponentially, owing to the geometric growth of the network interconnections. Somebody needs to arrive at a cost-effectiveness metric for a process that may consume $ billions and produce intellectual morons.
I've seen some amazing hallucinations in AI-produced videos. I wouldn't want to have an AI-driven robot performing brain surgery on me. (Or MCAS flying an airplane, but that's another story.)
Hopefully this race to the top makes practical embedded NPUs viable. Ever been stuck at a red light for a while even though no cars are coming? Having an embedded NPU would enable you to not connect the camera to the cloud. Cheaper and more reliable infrastructure, more privacy, and less congestion to boot.
Claude is actually pretty damn good. If people would stop acting like whiny liberals and actually learn to leverage these tools, we could all the Q said we should be doing at 1000x what we do now.
Its not whether Claude is pretty damn good - I told you that Claude consistently stays at the top of the benchmarks - but what the price for that little incremental extra performance (typically 3-6 months lead) ?
I started using Claude and burnt through the tokens very quickly. But then using GPT which is only slightly lower than Claude, cost me just $20 a month and did not affect the quality coding one bit - because I knew how to prompt it effectively, and its reasoning actually makes up for the base model quality.
So yes, Claude is a better model but too expensive, but GPT is by far the more cost effective model.
If people would stop acting like whiny liberals
Its worse than being whiny liberals. Its this insane fear of technology because they learnt the wrong lessons from 2020.
we could all the Q said we should be doing at 1000x what we do now.
Not sure what you meant to say there, but I find my productivity going up 50-100x easily.
We can spread memes 24/7 easily. That’s what I mean. Agree that Claude is over priced. I use it for QA in my agent swarm via my sub and the cli. Gpt 5.5 is my brain. Use locals for content gen
Microsoft worked out it's own ai is cheaper to use for it's purpose than Claude, and tbh I kinda shocked they didn't start with copilot and went with Claude in the first place.
The reason was that top many people inside were using Claude in their own so they decided to roll it out. The hype around Claude is insane. I have seen this first hand
and the first companies to actually deploy these tools at real scale are already pulling back because the invoice arrived before the productivity gain was large enough to cover it...
Well, Duh!
Laughing in 2000 .com bubble...
I love when humanity management consistently "over does it"... and a few moments later... the bill comes due and there's a massive about face.
It's like nobody gets there's no such thing as a free lunch...as if there's no memory of, "good, fast, cheap - you can only pick two"
The light is breaking through..."humans are cheaper than AI" - I guess...
When wages haven't risen in decades and costs of Claude for instance are sky high - that holds true...but is everyone (human) - going to feel the already "running on empty" situation get squeezed even harder? Sure seems that way...
When will humans realize the juice ain't worth the squeeze anymore in this new modern slavery construct? Good thing everyone is really healthy tho... and able to brush off stress & function at peak performance...pass the honey buns and Red Bull
Microsoft invested $5 billion in Anthropic.. gave 100,000 engineers Claude Code access.. encouraged adoption.. watched usage explode.. then the invoices arrived.. and issued an internal order to cancel nearly all Claude Code licenses by end of June and force everyone onto their own cheaper tool..
If bills come in ...i.e. 30 day cycle? There are certain metrics that can be graphed to anticipate consumables. And no one said anything? Or was it .... eh .... unfavorable to say anything .....
This is first rate stupid. It gears towards the question how MS is really run. Can shareholders trust in management on par with the fiduciary obligation they entrust to management? How about H1B1 (sounds like a flu) hiring? Subpar?
The whole thing reminds me of 2000 => data center builds. Everybody was screaming tics, tics, tics, but there were no customers. Yet, DC were built en masse, and 20 year leases were arranged, but no one thought of simply staging over time, decreasing capital expenditures, easing on supply chains, etc, not to mention the design savings .... Man. A Box in a Box was revolutionary.
But then again, that would require cutting staff, and easy jobs ...like "making sure jobs".
Those “100,000 engineers” are definitely FULLY qualified indians.
This is what happens when they subvert companies, they import thousands of their low iq people so they can continue to operate their caste system and feed their shallow egos.
Then these uneducated and talentless cretins rely on AI to write their garbage code. Covering for each other and getting anyone who complains fired.
This is a huge problem. In addition to what you pointed out, the quality of big tech products has gone straight downhill since 2020. I remember the days when I could google a programming question and get a useful answer. Nowadays I regularly ask Grok for a list of search results.
The recent ban on H1B applicants staying here while in the queue is definitely a good start to solving it.
They have destroyed tech. Try to find a simple answer to a Linux terminal command usually ends up with 50 different bobble head meaningless answers and basically wading through spam.
Stop with this fear bullshit. AI isn’t going anywhere. Claude is notorious for wasting tokens and increasing costs with its coding tool. They aren’t using less AI, just different models. You don’t say gas powered era of cars is dying because someone switches from chevron gasoline to Exxon gasoline. That’s what’s happening here.
What if AI is a globalist psyop to get companies to ditch all their human employees and then rugpull once there are only AI employees? Then the economy collapses all at once? Would be poetic justice to the laid off. But the only solution would be one world centralized government. Save us, government!
I cannot withhold a sinister chuckle of vindication. Not only have people bitten off more than they can chew, they have bitten off more than they can swallow. That's bad news for boa constrictors and pythons.
Claude has always been an overhyped, very high price-per-performance that everyone kept pushing based on raw benchmarks.
I keep telling people that OpenAI is far more economical than Claude because of this. I guess Microsoft just figured this out.
And the spin that this means "AI Bubble has burst" is so silly, even based on the headline in this screenshot itself!
It's not just Claude.
Now do you guys realise why datacenters and cheap energy are so crucial for the future?
More computational power with less energy cost = more token processing power.
He who can compute more and create the most powerful LLMs and solve the most powerful problems.
We are at that point in comparison to the PC evolution where we could barely fit 16KB of RAM on a PC. Today we can fit close to a TB of RAM on a PC. Thats a huge slop upwards and took 40 years to get to.
With LLMs the compute power will grow exponentially if Trump's plan works and will usher in centuries of prosperity and freedom.
Interesting then the direction Deepseek was going in.
BTW: I kind a like GAB as a meta AI. But indeed, Grok rules... on certain matters.
Elaborate?
Summary comparison — compute, GPUs, and energy (assumptions: public reporting, 2024–2026 hardware)
Key assumptions used: Grok (xAI) trains on large Nvidia H100/H800-based Colossus clusters (dense training at multi-petaflop scale); DeepSeek (Whale Lab) uses Mixture-of-Experts (MoE) designs and non‑Nvidia accelerators (Huawei Ascend / Cambricon + H800-style variants in some reports). Numbers below are order‑of‑magnitude estimates synthesized from published technical notes, reporting, and community analyses.
Caveats and uncertainty
If you want, I can produce a compact table with estimated GPU‑hours, per‑GPU power draws, and rough $ electricity costs for a few concrete training scenarios (assume H100/H800 specs and $0.06/kWh), using the numbers above.
So, in essence, Deepseek, by necessity, tries the lower the electricity bill to yield the same type of result.
Got it. Yes, there are lot of optimisations possible with the parameters, and when you do some of these optimisations it would potentially have an effect on the quality of the model and quantifying that and reducing these effects is a big part of the research. The compute and energy requirements would be scaled but still linearly.
Here is my prediction - we will see a completely different base paradigm for training these models. Like valves vs transistors. When this happens, we will see an order of magnitude reduction in compute+energy usage and we will probably see multiple iterations of this.
This is what makes this timeline so amazing. For those of us who were of age at the infancy of computers/internet etc, to be able to see another epoch - even bigger than that - and be able to contribute is incredible.
Totally agree.
And true. This kind of transformation is effectively seeing science fiction coming into being.
From my perspective, it seems more that, with LLMs, the REQUIREMENT for computing power will grow exponentially, owing to the geometric growth of the network interconnections. Somebody needs to arrive at a cost-effectiveness metric for a process that may consume $ billions and produce intellectual morons.
I've seen some amazing hallucinations in AI-produced videos. I wouldn't want to have an AI-driven robot performing brain surgery on me. (Or MCAS flying an airplane, but that's another story.)
Hopefully this race to the top makes practical embedded NPUs viable. Ever been stuck at a red light for a while even though no cars are coming? Having an embedded NPU would enable you to not connect the camera to the cloud. Cheaper and more reliable infrastructure, more privacy, and less congestion to boot.
I highly predict that this is where things will head. Personal cloud.
I actually have a lot of my stuff including photos sync straight to my home pc via pangolin
Let's just see what happen here. I do like Grok.
Claude is actually pretty damn good. If people would stop acting like whiny liberals and actually learn to leverage these tools, we could all the Q said we should be doing at 1000x what we do now.
Its not whether Claude is pretty damn good - I told you that Claude consistently stays at the top of the benchmarks - but what the price for that little incremental extra performance (typically 3-6 months lead) ?
I started using Claude and burnt through the tokens very quickly. But then using GPT which is only slightly lower than Claude, cost me just $20 a month and did not affect the quality coding one bit - because I knew how to prompt it effectively, and its reasoning actually makes up for the base model quality.
So yes, Claude is a better model but too expensive, but GPT is by far the more cost effective model.
Its worse than being whiny liberals. Its this insane fear of technology because they learnt the wrong lessons from 2020.
Not sure what you meant to say there, but I find my productivity going up 50-100x easily.
We can spread memes 24/7 easily. That’s what I mean. Agree that Claude is over priced. I use it for QA in my agent swarm via my sub and the cli. Gpt 5.5 is my brain. Use locals for content gen
Exactly.
Microsoft worked out it's own ai is cheaper to use for it's purpose than Claude, and tbh I kinda shocked they didn't start with copilot and went with Claude in the first place.
The reason was that top many people inside were using Claude in their own so they decided to roll it out. The hype around Claude is insane. I have seen this first hand
Claude is better.
They had their engineers milk claude for all it's worth.
Now they will train Copilot on Claude's work.
Copilot will be as good and cheaper to run.
Well, Duh!
Laughing in 2000 .com bubble...
I love when
humanitymanagement consistently "over does it"... and a few moments later... the bill comes due and there's a massive about face.It's like nobody gets there's no such thing as a free lunch...as if there's no memory of, "good, fast, cheap - you can only pick two"
The light is breaking through..."humans are cheaper than AI" - I guess...
When wages haven't risen in decades and costs of Claude for instance are sky high - that holds true...but is everyone (human) - going to feel the already "running on empty" situation get squeezed even harder? Sure seems that way...
When will humans realize the juice ain't worth the squeeze anymore in this new modern slavery construct? Good thing everyone is really healthy tho... and able to brush off stress & function at peak performance...pass the honey buns and Red Bull
u/#popcornclown
That's not counting the stress of things break down and don't function correctly.
WHEN the Grandparents are using AI it is here to STAY!!
LOL
Someone was actually budgetting [/s]
If bills come in ...i.e. 30 day cycle? There are certain metrics that can be graphed to anticipate consumables. And no one said anything? Or was it .... eh .... unfavorable to say anything .....
This is first rate stupid. It gears towards the question how MS is really run. Can shareholders trust in management on par with the fiduciary obligation they entrust to management? How about H1B1 (sounds like a flu) hiring? Subpar?
The whole thing reminds me of 2000 => data center builds. Everybody was screaming tics, tics, tics, but there were no customers. Yet, DC were built en masse, and 20 year leases were arranged, but no one thought of simply staging over time, decreasing capital expenditures, easing on supply chains, etc, not to mention the design savings .... Man. A Box in a Box was revolutionary.
But then again, that would require cutting staff, and easy jobs ...like "making sure jobs".
Those “100,000 engineers” are definitely FULLY qualified indians.
This is what happens when they subvert companies, they import thousands of their low iq people so they can continue to operate their caste system and feed their shallow egos.
Then these uneducated and talentless cretins rely on AI to write their garbage code. Covering for each other and getting anyone who complains fired.
Remove. Every. Last. One.
This is a huge problem. In addition to what you pointed out, the quality of big tech products has gone straight downhill since 2020. I remember the days when I could google a programming question and get a useful answer. Nowadays I regularly ask Grok for a list of search results.
The recent ban on H1B applicants staying here while in the queue is definitely a good start to solving it.
They have destroyed tech. Try to find a simple answer to a Linux terminal command usually ends up with 50 different bobble head meaningless answers and basically wading through spam.
All they seem to do is copy and paste.
It's called "Think b4 you leap.""
Stop with this fear bullshit. AI isn’t going anywhere. Claude is notorious for wasting tokens and increasing costs with its coding tool. They aren’t using less AI, just different models. You don’t say gas powered era of cars is dying because someone switches from chevron gasoline to Exxon gasoline. That’s what’s happening here.
What if AI is a globalist psyop to get companies to ditch all their human employees and then rugpull once there are only AI employees? Then the economy collapses all at once? Would be poetic justice to the laid off. But the only solution would be one world centralized government. Save us, government!
You think outside the box.
I cannot withhold a sinister chuckle of vindication. Not only have people bitten off more than they can chew, they have bitten off more than they can swallow. That's bad news for boa constrictors and pythons.
yes. LOL
If they read GA.W they would already have the solution. Maybe the'll re-discover this guy's theory: https://greatawakening.win/p/1ASsZF0pku/interesting-post-about-the-sun-a/
I have never seen this post, but it makes lots of sense. They have lied to us for so long I don't trust anything they said.