Here are a few thoughts: - The publicly available information about how inferenc...

materielle · 2026-05-27T18:54:27 1779908067

I'm about to leave a shallow comment, but I am a bit skeptical of the supposed drop in inference costs. If AI labs saw a lot of potential there, they'd surely be bragging about it non-stop? So the fact that publicly available information is conflicted is probably a sign that at the very least, the numbers aren't amazing.

Yes I know there's no evidence and this is lazy reasoning. But there's probably a bit of truth to this line of thought.

Tuna-Fish · 2026-05-27T19:06:35 1779908795

Why on earth would AI labs be bragging about how little the product they sell actually costs them to make? You don't want to do anything that reduces it's perceived value to the user, that might make them less willing to pay for it.

Also, inference costs are bound to go way down with more optimized architectures. GPUs are fundamentally not great at inference. No platform where the weights are streamed from a large pool of memory is. If the models ever quiet down, there will be massive step changes in cost/token, energy/token and tokens/second, as models are etched into silicon ala https://chatjimmy.ai/

overgard · 2026-05-27T21:33:58 1779917638

A couple of years ago Altman was saying the price of AI compute is going to drop 90% year over year or something like that, so I don't think they're nervous about talking about lowering their costs. They probably just haven't been able to lower their costs.

You have to keep in mind that about 99% of their announcements are targeted towards investors (their most important revenue source..), so they're not going to be afraid to mention metrics that make the business look better.

bwhiting2356 · 2026-05-28T05:26:51 1779946011

Jevons paradox. Cheaper tokens does not mean we will spend less.

Skinney · 2026-05-28T08:07:45 1779955665

Cheaper tokens means the company's margins increase, which would be valuable for investors to hear

missedthecue · 2026-05-28T15:14:34 1779981274

The main limit to my token spend right now is that I'm running out of hours in a day.

mcmcmc · 2026-05-27T22:28:37 1779920917

Ah yes, Sam “Not Consistently Candid” Altman

pixelready · 2026-05-28T02:48:25 1779936505

Oh, is that the guy that sold Loopt by claiming it had hundreds of thousands of users and it turned out to have 500 DAU after his exit?

chipsrafferty · 2026-05-28T04:06:02 1779941162

Yep, the very same scammer. Wonder if he's lying about OpenAI too? Maybe about a person blowing a metal instrument?

whateveracct · 2026-05-28T04:53:15 1779943995

he lied. he's good at that.

golem14 · 2026-05-27T19:37:22 1779910642

Why would any company brag about their margins ? Yet they do, to attract investors.

Tuna-Fish · 2026-05-27T19:43:07 1779910987

The key AI labs are not public companies, they are at liberty to brag about their margins to potential investors in private.

SiempreViernes · 2026-05-27T20:24:28 1779913468

And investors will leak such claims quickly enough that this reasoning cannot plausibly hide big secrets.

Tuna-Fish · 2026-05-27T21:29:47 1779917387

It's not a big secret. If you just do the math yourself, it's easy to compute that inference doesn't cost all that much. People just see all the capital investment going around and all the new data centers being built, see that it's spent on "AI", put two and two together and get a three, or "clearly serving AI requests costs an arm and a leg".

The 1 they were missing is that AI requires both training and inference, and training is by far the expensive part. And that in principle you can stop training at any point and keep using the models as they are. (But that means that if other companies keep improving their models, you'll be left behind...)

In contrast, inference is fairly cheap and all the providers have great margins on it. Eventually either investment in training stops having commensurate impact on model quality, and people stop doing that and instead concentrate on making inference faster and even more efficient. Or if that doesn't happen, things will get very weird very quickly.

whatever1 · 2026-05-28T00:39:40 1779928780

The market already shows where it will go.

If you want frontier model you will pay more for inference to essentially fund the expensive training.

If you don’t need frontier model you will get dirt cheap inference, which eventually will approach the cost of electricity spent per token.

mattmanser · 2026-05-28T13:24:03 1779974643

This is technically correct, but practically false.

They can't stop training as then the AI's knowledge will become out-of-date very quickly. Their knowledge stops the day you stop training.

flextheruler · 2026-05-28T16:32:45 1779985965

Yes it seems that this discussion that has sparked such controversy involves an already well defined concept in business.

Net margin versus gross margin.

Net shows profitability after extracting all expenses while gross only extracts the cost of the goods sold. Putting the model training costs into a one time fixed expense provides a much better gross margin.

This is known as COGS reclassification or classification shifting and is a common tactic to mislead investors.

This is why analysts look at Free Cash Flow Margin.

WorldCom and MicroStrategy did this before the Dotcom Bubble imploded.

ethin · 2026-05-27T22:53:36 1779922416

> If you just do the math yourself, it's easy to compute that inference doesn't cost all that much.

Show us your work, then. If it's so easy to do, this should be a trivial request to accommodate, no?

mediaman · 2026-05-27T23:32:39 1779924759

Just look at large open weights models being served by inference providers.

Kimi 2.6 is a 1 trillion total / 32B active parameter model that's something comparable to Sonnet. Sonnet's API pricing is $5 in, $15 out per million tokens. Deepinfra serves Kimi at $0.75 in, $3.50 out, and about the same at openrouter. So you're looking at a 4-7x multiple that Anthropic is charging compared to market rates that any plebe can get with a credit card.

majormajor · 2026-05-28T01:02:59 1779930179

I'm not sure just how good that looks for Anthropic/OpenAI.

4-7x isn't a tiny markup, but how does that compare to high-margin internet businesses like AdSense? Meta and Google do hundreds of billions in ad revenue a year, and after taking out the publisher's portion (60-80% per some searching), I wonder what the ratio of the remaining tens-of-billions is against the compute cost and headcount required to run it.

And how much room for maintaining or improving that margin do they have if the cheap competitors also continue getting better? Is there a "good enough" point where the easier inference tasks are all moving to vendors massively undercutting them, and then they don't have the volume necessary to justify spending on further cutting-edge development?

re-thc · 2026-05-28T10:26:52 1779964012

> Kimi 2.6 is a 1 trillion total / 32B active parameter model that's something comparable to Sonnet.

No it's not. On some rigged paper maybe. Some such benchmarks say all models group together, which they clearly do not.

> Sonnet's API pricing is $5 in, $15 out per million tokens. Deepinfra serves Kimi at $0.75 in, $3.50 out, and about the same at openrouter. So you're looking at a 4-7x multiple that Anthropic is charging compared to market rates that any plebe can get with a credit card.

That's not saying much. You can get "cloud" at AWS and you can get a VPS. There is likely a 10x difference. It's not "same". Whilst AWS costs more they also don't have 7x margins similarly.

brookst · 2026-05-28T19:06:59 1779995219

I’m wary of “has not been leaked in a way that was picked up in public news” as proof or disproof of anything.

bwhiting2356 · 2026-05-27T19:57:02 1779911822

this is changing soon

joelthelion · 2026-05-27T20:48:11 1779914891

Not really, how much of a public company are you when 5% of your capital is public ?

Tuna-Fish · 2026-05-27T21:22:48 1779916968

That doesn't matter for the legal requirements.

The short and only kind of wrong version is:

In the US, companies are not allowed to unfairly privilege some investors over others by giving them access to secret information that would let them judge the future prospects of the company. (Except in all the ways they can, but these usually involve some kinds of insider trading rules.) Private companies can handle giving out secrets to investors by literally writing and memo and mailing it to all their investors, if they want to give out some secrets to one of them.

Public companies cannot do that, even if they knew who all their investors were, but must instead consider every member of the public a potential investor, even if they don't already own the stock. Because of this, when public companies want to reveal material information about their future prospects, they must reveal it to everyone.

tverbeure · 2026-05-27T21:12:01 1779916321

The percentage is irrelevant for this discussion. As soon as you’re public, you need to report detailed financial numbers.

overgard · 2026-05-27T21:35:20 1779917720

Plus, you have to do real GAAP accounting, not their made up metrics.

fakedang · 2026-05-27T21:52:46 1779918766

That's changing with this administration though. Reduced reporting cycles reduce transparency.

mrosett · 2026-05-28T05:26:58 1779946018

It won't impact the disclosure of key business details because it doesn't reduce the level of disclosure needed in the S-1 or the 10-K.

kfse · 2026-05-28T14:31:33 1779978693

Besides the legal requirement, the reason these companies go public is often to provide liquidity for early investors or employees. So they do want to have as good of a margin story that they can, at least in terms of unit margin.

jimnotgym · 2026-05-28T05:43:56 1779947036

This is an interesting anomaly in the US. In the civilised world all corporations have to file public accounts, as the price for their limited liability. The detail and audit requirements depend on the size, turnover, staff numbers etc. This is because the shareholders are not the only stakeholder. The companies creditors, for instance, who are exposed to the limited liability have a right to see what they are lending to.

To answer the sibling comment, all of these public accounts follow local GAAP or IFRS.

The US still astounds me with its willingness to allow corporations to rip people off!

kortilla · 2026-05-28T06:23:21 1779949401

Creditors in the US can make visibility into financials a requirement for financing if they want. Protecting creditors isn’t a good argument for public reporting.

jimnotgym · 2026-05-28T17:42:32 1779990152

What about potential employees, can they look? The local community that consents to let the company build and operate in their town? How does that help, if they don't follow have to follow GAAP anyway?

kortilla · 2026-05-29T03:21:55 1780024915

Why are those things relevant to either employees or a town?

Most of the US is at-will so the financial health of the company is unlikely to be the reason you’ll suddenly lose a job.

Same for a town, if you’re structuring a deal that has counterparty risk then you mitigate the risk. If an employer is just leasing some office space in your town, why in the world would you ever even think you had the need to look at their financials?

VBprogrammer · 2026-05-28T07:28:05 1779953285

What are the arguments against public reporting?

As a consumer you are often sending deposits or even the full cost of goods to companies some time before you receive those goods (in effect you become a creditor). You are also dependent upon some of those companies for service and repairs. It seems reasonable that you can check the finances of a company you are creating a business relationship with, I know in the past I've checked company statements.

You are unlikely to have significant enough sway to force that kind of disclosure. Small businesses as consumers have less legal protection and are similarly unlikely to be able to make disclosure a precondition of a deal.

nradov · 2026-05-28T16:26:42 1779985602

So what. As a customer you can insist on seeing audited financial statements as a condition of purchasing, or purchase from another vendor, or do without. No problem.

VBprogrammer · 2026-05-28T21:05:48 1780002348

Or, in the real world, running a limited liability company could come with some sensible reporting requirements?

nradov · 2026-05-28T21:25:04 1780003504

Why? And what's sensible about it?

daemin · 2026-05-28T01:45:19 1779932719

Isn't there a limit on the public markets where if a company has less than a certain percentage of its ownership traded publicly then it is no longer a public company and therefore de-listed?

I remember hearing about a guy trying to squeeze out short sellers of his own company but ended up effectively taking his company private because he bought out like 95% of all the shares.

I wonder how that aligns to these small releases of stock for the public.

extraextra · 2026-05-29T14:01:03 1780063263

There is no legal minimum free float requirement before deregistration in US, however, different exchanges have different rules

Essentially, a stock has to stay above 1$ per share, have a minimum market cap of $15m, minimum 400 shareholders and "adequate" liquidity If it meets those 4 criteria, it's essentially not at risk of deregistration

lmm · 2026-05-28T01:00:02 1779930002

Growing companies don't brag about their margins, they brag about their growth and revenue. Margin talk is for when you're a mature company squeezing out every bit of profitability you can - if anything it would be a negative sign to be worrying about your margins when you're supposed to still be growing and innovating.

amarant · 2026-05-28T06:27:16 1779949636

I mean, did anyone expect them to not have margins? Why keep it secret?

Yoric · 2026-05-28T06:55:12 1779951312

> Why on earth would AI labs be bragging about how little the product they sell actually costs them to make? You don't want to do anything that reduces it's perceived value to the user, that might make them less willing to pay for it.

Wouldn't they be bragging about it to investors? It feels like something that would matter a lot to them, and at least OpenAI kinda feels desperate to find them.

There's also the small question about whether a drop in inference cost would actually change anything about profitability, when training seems to get exponentially more expensive.

neltnerb · 2026-05-27T21:45:08 1779918308

Because companies that want to go public need to look profitable or potentially profitable. And before they go public they have to release real, actual, legally demonstrable numbers for their costs and revenue anyway.

extraextra · 2026-05-29T14:08:58 1780063738

When they will actually file to go public, their numbers will be intensely scrutinized. That's all that global headlines will be talking about for weeks on end. Why would they create forward expectations before it's necessary?

Of course they don't want to create forward expectations in a volatile macro environment, with the public listing being 6 months out.

etempleton · 2026-05-28T00:37:02 1779928622

Because the most important thing for any pure play AI company right now is to prove they are a viable company. And sure they have proved they can make billions, but also that they can lose billions more. They are going to need even more money and to prove to the next round of investors at an even higher valuation that they are a viable business they need to show not that they can generate revenue, but that they can one day turn a healthy profit. And that is the trillion dollar question.

jimbokun · 2026-05-27T21:28:54 1779917334

I doubt having to replace every single chip in your data center every time you release a new model will bring down costs.

kopirgan · 2026-05-28T03:25:57 1779938757

Went to that URL asked one question - "how is this different from other AI" and it took 598/6144 tokens, not sure what that means.

philipswood · 2026-05-28T03:55:45 1779940545

Not super clear from the site itself, but this LLM is running on specialized silicon implementing just it. So has super low energy use and blazing speed.

See https://taalas.com/products/

Edit: updated link

kopirgan · 2026-05-28T04:02:32 1779940952

Incredible increase over Nvidia! Need to read more.. Thanks!

DrewADesign · 2026-05-28T00:01:21 1779926481

Because they can think more than one quarter into the future? Why on earth would someone adopt something into their core workflow that was fantastically unprofitable? Uncertainty and business don’t mix. Most people aren’t hype-eating bacteria that only care about maximizing their next paycheck.

wheresmylogin · 2026-05-28T12:38:39 1779971919

One reason is that all the code you write with this goes in your private git. If using AI no longer is possible because of cost, you can still profit a lot from what you did with it before.

DrewADesign · 2026-05-28T16:21:15 1779985275

For consultants? Sure. What percentage of contractors are consultants? And is that better than going with something in your stack that’s sustainable even if it’s not totally optimal? I’d wager most would say no.

nradov · 2026-05-28T16:31:35 1779985895

Regardless of profitability there will always be multiple good LLM vendors as well as open-source alternatives (slightly worse but still pretty good). If one vendor fails then it's easy to switch your core workflow to a competitor.

DrewADesign · 2026-05-29T00:58:08 1780016288

On an individual basis for coding? Sure. If you’re a significant business with agents that do more nuanced work, which is the only kind of customer that will let any of these companies pay back those trillions of dollars as quickly as they need to to stay alive, these are not fungible services.

m463 · 2026-05-28T21:11:23 1780002683

I wonder if inference costs will go down...

or will it be like microsoft office, where the software bloats to use/fill current hardware?

(and in this case bloats might mean better thinking or pulling in more data)

kopirgan · 2026-05-28T03:20:32 1779938432

If inference costs drop 90% or whatever, that would be a massive write-off of hardware even before they gave any returns for it?! Given Chinese and others are snapping at the heels and would also benefit from such reduction in cost.

solarkraft · 2026-05-28T04:11:24 1779941484

> Why on earth would AI labs be bragging about how little the product they sell actually costs them to make?

Investor confidence. They have a bit of a need for cash (also an interesting part of the profitability discussion of course).

> Also, inference costs are bound to go way down with more optimized architectures

I agree. Jimmy is incredible, I wonder what non-toy use cases they have. Surely they’ll come out with updated chips soon.

That said, I was apparently a bit over-excited for Groq and Cerebras. I thought they’d quickly dethrone Nvidia for inference, but not so far. Even the GPT spark trial isn’t seeming to go far.

whatshisface · 2026-05-27T18:58:14 1779908294

Inference has traditionally been far less expensive than training. One public example is the fact that hobbyists can run StableDiffusion ($600k training costs[1]) on their personal computers.

Speaking to your point, inference being dramatically less costly than training would not be seen as a delta from the norm. The model of providing inference for anything near the operational costs (like a utility would), would the delta from the norm if it were true.

[1] https://x.com/emostaque/status/1563870674111832066

thesz · 2026-05-27T20:31:09 1779913869

The difference between training and inference is 1) one have to keep intermediate results for backward pass in training and 2) computation for training double because of the backward pass.

Training is also done over batches, which increase memory requirements by several orders of magnitude. This is why training needs costly compute.

One of the ways out of this unfortunate situation is to use something like Stochastic Average Gradient Descent [1]. Examples there are mostly concerned with regularized logistic regression, which makes problem more or less convex. Neural networks are inherently non-convex. Still, maybe some ideas from there can be utilized in the context of neural networks, like use of estimated Lipshitz constant to derive curvature and appropriate learning step.

  [1] https://www.cs.ubc.ca/~schmidtm/Courses/540-W19/L12.pdf

janalsncm · 2026-05-27T20:55:43 1779915343

So one way to think about it is roughly,

Training is inference + backwards pass (~2x inference cost) + activations (vram overhead) + optimizer (vram overhead) + gradients (vram overhead).

thesz · 2026-05-27T21:29:23 1779917363

Multiply "inference + backwards pass (~2x inference cost) + activations (vram overhead)" by batch size (thousands) to get to the actual RAM and compute cost. Optimizer like ADAM adds only two or three model-sized overhead.

And last, but not least, you need only one hidden layer kept in RAM for inference, but you need all of them (61 for Deepseek models) kept in RAM for computing gradient for one sample.

xyhopguy · 2026-05-27T23:30:12 1779924612

Microbatch size is a hyperparameter, it can be set to 1 and work just as effectively. With gradient accumulation it's equivalent even. Large batch sizes are used to increase parallelism, and sometimes to reduce variance in the loss signal (at the cost of increased bias).

Batch size is frequently limited by compute bottlenecks well before memory.

mcv · 2026-05-28T08:18:57 1779956337

And of course you do all of this for every object in your training set, which is going to be larger than the total number of uses for any individual user.

galaxyLogic · 2026-05-27T23:14:11 1779923651

Does it matter what is the difference in size of needed inputs for inference vs. training?

whatshisface · 2026-05-27T23:04:08 1779923048

That is an estimate of the relative cost of one training step, but you have to multiply it by the number of training steps, an unknown quantity.

mike_hearn · 2026-05-28T11:38:34 1779968314

It's all got much more complex than that in recent years. Training now involves large amounts of inference for RL rollouts and similar. You can't disentangle them computationally like that. "Inference" is just the word used to mean serving customer traffic now, and "training" means creating the model you serve.

vanviegen · 2026-05-28T11:22:42 1779967362

I think in your StableDiffusion example, a lot more than $600k will have been spend on electricity alone for inference (on those personal computers you mention). So inference is more expensive then training.

lumost · 2026-05-27T21:41:24 1779918084

For equal capability tokens, there has been about a 10x drop in cost every 6 months.

We are still chasing the best because the best is moving rapidly, but it’s a simple thought experiment to work out what the cost to serve an 8B model from 2 years ago is in a world of 2T models.

Note: parameter counts are illustrative. Concretely, qwen3.6 27B delivers opus 4.5 capability at 1/27th the cost on openrouter. Single chip llama3 8b performance can exceed 17k tokens/sec.

byzantinegene · 2026-05-28T06:21:23 1779949283

8B models would be consider obsolete in the world of 2T models, at least if we're talking about the competitiveness of OpenAI/Anthropic. The only reason why they are valued so highly is their supposed dominance at the top end.

lumost · 2026-05-28T13:52:20 1779976340

The main story of agent use cases is in enterprise so far. An enterprise will only pay for a model capable of handling the task and no more. Most enterprise's see no need to hire PhDs as factory line workers.

Coding is an interesting case as [1] the pace of progress has been absurd and [2] it's hard to put an upper bound on required capability. However hard to put a bound on and will are different, it's quite possible that the average engineer will cease to see the benefit of rapid progress - or that their employer will be satisfied with lower tier models.

How smart of a model do you need to build a high quality CRUD app for internal users? Or build a scalable web service?

byzantinegene · 2026-05-29T02:49:50 1780022990

yes, which is why the revenue growth story is not looking so great for Anthropic/OpenAI, when open-source alternatives are not far behind with much lower costs.

joshuahedlund · 2026-05-28T11:47:30 1779968850

> For equal capability tokens, there has been about a 10x drop in cost every 6 months

Is this still happening? Opus 4.5 was six months ago, can you get its capabilities for 1/10 cost now? Are we on track to get the same for 4.6 in a couple months?

lumost · 2026-05-28T13:47:19 1779976039

Pretty much, Kimi K2.6 is opus 4.6 quality for coding. If you include discounts due to more efficient input caching it is around 1/10th of opus4.6.

https://openrouter.ai/moonshotai/kimi-k2.6

The march of cost efficiency moves on.

joshuahedlund · 2026-05-29T11:16:50 1780053410

Why haven’t I heard of this? Is it available in IDEs like Cursor?

no-name-here · 2026-05-28T03:15:29 1779938129

> I am a bit skeptical of the supposed drop in inference costs. If AI labs saw a lot of potential there, they'd surely be bragging about it non-stop?

Unless to the grandparent commenter’s point they’re using it to obscure their large prisoner’s dilemma (training) cost?

neuronexmachina · 2026-05-28T01:04:19 1779930259

> If AI labs saw a lot of potential there, they'd surely be bragging about it non-stop?

Google seems to pretty regularly post about how their TPU and algorithm advancements have been decreasing energy costs for both inference and training.

brookst · 2026-05-28T19:05:23 1779995123

What other companies brag about lowered costs? Isn’t that just a complicated way of asking customers to demand lower prices?

vlovich123 · 2026-05-27T20:10:19 1779912619

Small alternative potential future changes that alter this analysis:

* At some point model capability reaches diminishing returns. Then inference >> training in the future but training >> inference now. It’s not a prisoner’s dilemma but a land grab to solidify market position and be one of the 2-3 firms left standing as dominant in the space. The model companies aren’t super sticky yet but they’re working on it.

* even if training remains >> inference, it’s possible to have multiple price points like they do today. If you need the most capable model you’ll be paying exponentially more per token to supplement the training cost even though the serving cost is marginal because most people will be satisfied with cheaper / less capable models for most tasks.

I buy that inference is a dropping line item while training is a growing one. There’s all sorts of things on the horizon that’ll be order of magnitudes improvements, from startups burning models into ASICs to get order of magnitudes more performance to alternate architectures like diffusion transformers that have orders of magnitude structural optimizations. It’s inevitable that it’ll come down even further from where we are. It’s possible model training also will go down but I’ve not seen any compelling research suggesting major “easy” reductions here.

janalsncm · 2026-05-27T21:01:18 1779915678

The issue is that most tasks do not require frontier-level intelligence, but companies like OAI can really only profit off of the frontier. Capabilities from a year or two ago are so outdated that even OpenAI gives it away for free and there are many other models biting at their heels. In other words they are spending huge amounts of money to cash in on a depreciating asset.

So one possible future is that frontier-level training becomes so expensive and the use cases so sparse that it simply isn’t viable to keep going bigger.

extraextra · 2026-05-29T14:36:55 1780065415

Once the land grab is over, the market will consolidate and the winners will absorb the losers. Then the few winners will be the only ones with real capital to train frontier models and will have true pricing power. Similar to how social media companies or the gig-economy benefits from network effects, AI companies will benefit from having the lion's share of paying customers (that also constantly feed in more data to train the models on).

twobitshifter · 2026-05-28T01:47:47 1779932867

We have GPU costs, power costs, and how many token/s models can generate on those GPUs. It’s possible to figure out the marginal cost based on this. The current estimate is about $0.40 per million tokens for gpt4 equivalent model. Sonnet 4 is $15 per million tokens, so they are charging high margins on inference. The issue is how large of a margin is needed to recover their costs before the GPUs age out, and how high of a margin can be charged before it’s not economically viable.

https://www.gpunex.com/blog/ai-inference-economics-2026/

rudedogg · 2026-05-28T02:01:36 1779933696

That seems way off to me.

I skimmed the article, but couldn’t spot any details on their estimates. They mention 70b+ params as being large in several places. But we’ve had several 100b+ param models that trail Sonnet.

zozbot234 · 2026-05-28T09:52:38 1779961958

Why would power spikes from training runs imply training>>inference? The cost of a training run scales with energy, whereas power is energy per unit time. All that tells you is that they're speeding up their training run so it will take less time overall (probably chasing some first-mover advantage, where they're out with a given model before their closest competitors), whereas they obviously can't do that for inference (which is a steady flow of requests over time).

somewhereoutth · 2026-05-27T23:10:10 1779923410

Yes the huge discrete stepwise training spend is critical.

Maybe investors will realise that "the only winning move is not to play".

And so we are left with (as was) frontier models getting more and more out of date as whoever their post bankruptcy custodians are tries to eek pennies on the dollar for inference on their decaying property. Perhaps along with local and/or highly specialized models still feeding on the after-glow of the huge amount of training that was (and is no longer) done.

The next AI winter is going to be deep, savage, and long.

extraextra · 2026-05-29T14:59:36 1780066776

Bankruptcies? The winners will gobble up the losers and the few remaining players will have pricing power. Don't be naive thinking that OpenAI or Anthropic can possibly go bankrupt. There will always be someone happy to buy them up for a nice price. Yes, the market will have to go through a consolidation phase though.

galaxyLogic · 2026-05-27T23:18:17 1779923897

> frontier models getting more and more out of date

Why are they getting out of date? Is it because we have new content from the internet that the older models did not have? Or are we simply trying to increase the size of the training data? In other words not more up-todate in terms of time the content was created vs. wanting to use bigger training-input-sets?

somewhereoutth · 2026-05-28T13:43:22 1779975802

lack of new content from the internet will make them go out of date. Not just facts and figures, but (for example) new programming languages/techniques.

galaxyLogic · 2026-05-29T01:20:34 1780017634

I see makes sense. Then it it kinda says that quality of the model is the topicality of its inputs I assume.

IX-103 · 2026-05-28T02:57:09 1779937029

I don't see how it would be possible for inference costs to dominate training costs, even after amortization.

Training involves multiple passes over the entire training dataset, ideally in large batches where you can perform inference on as many samples as possible simultaneously and then perform backpropagation to adjust the model weights (which is about as expensive as inference).

Let's consider the size of the dataset we're dealing with here. The dataset likely consists of practically every piece of digitized text they can get their hands on (including that extracted from audio and video). We know Google has digitized a large portion of the books in existence as part of their "search book contents" feature and we have no reason to believe they're not using it alongside their cache of 90+% of the internet to train their models. We're talking about 100s of millions of books each with an average of 100,000s of tokens. The internet has 10s to 100s of billions of pages on it with who knows how many tokens on average. This is a huge dataset that we've got to go through hundreds of times.

Second, let's consider the effect of batching and how it sets requirements for our hardware. We know that larger batch sizes converge faster, are more stable, and produce better models. So if you want a good model you need large batch sizes. This means that you need machines several orders of magnitude more powerful than you use for inference. From what I heard Google uses clusters of 100s of the their TPUs all located in a single rack for training. These clusters are organized in a customized computing architecture to maximize memory locality between cores (really critical for efficient back-propagation). Further, you can't use reduced precision weights for training like you can for inference, so there are no shortcuts.

Finally, the initial training stage is followed by reinforcement learning stages - this is key development in how AI models have improved in the past year. This may mean going through a curated set of traces (either synthetic or captured from users) and adjusting the weights based on experienced outcome.

Overall there's so many orders of magnitude more work and more hardware requirements for training that I find it improbable that inference dominates. The number of "inference" steps in training is freaking ridiculous and includes such factors as the "number of words ever written".

atq2119 · 2026-05-28T04:46:04 1779943564

It's been a while since I saw a detailed paper on a high end training run, but extrapolating from what I remember, it seems those training runs are in the 10s of trillions of tokens. This already accounts for potentially sampling tokens multiple times during the training run.

That seems like a large number, until you realize that OpenAI claims to have almost a billion weekly users. And OpenRouter shows many models at over a trillion tokens per week.

So in pure token terms, I'd say it is in fact extremely plausible that inference dominates, at least for the popular models.

johnecheck · 2026-05-28T05:01:41 1779944501

Not saying you're wrong, but I'll note why inference might dominate despite everything you mentioned.

A given model is trained once but applied N times. A large enough N will dominate training, no matter how complex and costly it was.

But how long is a model useful for? How often will labs need to train new models? Time will tell.

upbeat_general · 2026-05-28T05:39:32 1779946772

This statement is well known to be incorrect for at least a year.

extraextra · 2026-05-29T15:09:08 1780067348

Great points. - At the end of the day those are still private companies (albeit huge ones), so we can only speculate about the state of their private financial situation. Once they will decide it's the right time to IPO, they'll publish all their financials and we'll start to have a clearer picture. - Later, each company will slightly specialize and have a different go-to-market strategy, which will allow us to understand on a deeper level what works in the market and what doesn't (think about how Facebook, Instagram and TikTok are all huge universal social media platforms, but, each with a different target audience and different user base). - Finally, the market will go through a consolidation phase in which winners will gobble up the losers and then the incumbents will have a real moat (against new-comers) and real pricing power on their user base.

stevenally · 2026-05-28T00:19:04 1779927544

> If we don't even know the ratio between amortized capital expenses and operational costs, outside investor analysis is impossible.

And yet we surely need this data for the IPO? Or are they relying on rule changes on the indexes to force ETFs to buy shares?

extraextra · 2026-05-29T14:56:10 1780066570

The IPOs are months away, potentially 6 months or more. We're in a volatile macro environment. AI companies have all the incentives to not create higher expectations regarding their financial situation a long time before the IPO. Obviously at IPO they will have to disclose their full financial situation.

The market is super hyped anyway for their IPOs. If they raise investors expectations now and things change until the IPO, investors will be disappointed. It's a lose-lose proposition.

The smart play for any company is to keep their cards close to their chest until close to the IPO time.