More

Oras · 2026-06-08T16:23:12 1780935792

1k TPS is great, but I’m more fascinated by the amount of AI generated comments in this thread!

trollbridge · 2026-06-08T16:54:16 1780937656

Comments at 1,000 TPS is a terrifying future.

0xbadcafebee · 2026-06-08T17:26:27 1780939587

I prefer a thousand smart AI comments to a thousand dumb human comments

wartywhoa23 · 2026-06-08T18:45:22 1780944322

Well, you can just vibecode a complete AI echochamber version of HN!

eli · 2026-06-08T16:24:37 1780935877

Like what?

adam_arthur · 2026-06-08T18:19:37 1780942777

There are many with subtle tells.

Not nearly as obvious as the ones from 6 months ago, but seems to be more the use of hyperbolic phrasing in a particularly unnatural way.

The assess/explain, then hyperbole at the end kind of structure.

Top comment looks suspicious from this perspective, but it's kind of a losing battle to be able to differentiate them with sufficient accuracy anyway

Oras · 2026-06-08T14:13:19 1780927999

The article does not put things in context. Raising $7 Billion to continue innovating and serving a frontier model is not that much when you compare that Anthropic and Google are paying $1B per month for X data centre just to cope with inference demand.

Oras · 2026-06-08T14:02:24 1780927344

Congrats on launch. I have experienced these issues first hand with `Open Finance` a few years ago.

I feel that you'll end up being an automation agency (you mentioned UiPath), companies who have the skills and capacity to build, will not need your service. But those who want the full service, you might fill a gap.

I wish you all the best.

fkilaiwi · 2026-06-08T14:07:25 1780927645

thanks for the kind words! We have been seeing this pattern of customers wanting full service since we launch. Let's see how it goes!

Oras · 2026-06-08T07:48:44 1780904924

I might have a different take, I’m happy with this price per token so only those who’re using it for value would use for what they want.

There are so many useless cases such as people bragging about their token consumption that has no product and no value add, or those with OpenClaw doing useless automation that could be a Python script.

Oras · 2026-06-05T18:10:07 1780683007

Which part that makes them look bigger than they are? Which services are larger than stripe?

Oras · 2026-06-05T15:01:22 1780671682

Paying $5 a month for Digital Ocean or Hetzner will save you from the pain of using any of these cloud platform for just a simple VM.

ARMack · 2026-06-07T19:59:14 1780862354

Thanks, will do some research on those two as well as the above, I appreciate the help.

Oras · 2026-05-29T20:27:31 1780086451

+1 Even though the startup didn’t work out (solo founder), I learned in 1.5 what I wouldn’t learn in 10 years.

Oras · 2026-05-29T20:20:33 1780086033

Sounds like they don’t have a moat at all. It’s like software consultancy with a data centre. And then the article mentions many customers using these models on prem (so data centre is not really a plus).

What’s stopping any country backed startup from fine-tuning small open source models?

whiplash451 · 2026-05-30T04:42:39 1780116159

Maybe because distilling small models from bigger ones that you control gives you better small models than fine-tuning from bigger models you don't control?

(I am not claiming it is the case, but stating this as an assumption)

rldjbpin · 2026-06-02T07:17:36 1780384656

their moat is where they are based from and that they are making their own models. they have been before the distillation era in the open-weights model.

their model's efficacy for the mainstream comparisons may not be up to the task, but they are pivoting to their own lane for it. but the scope beyond the local market, it is yet to be seen.

vb-8448 · 2026-05-30T17:46:27 1780163187

No one in Europe will buy from a random startup, the consultancy part is a MUST to do businesses with big corps, banks, finances, insurances, governs, public administration ...

Oras · 2026-05-30T19:38:54 1780169934

Mistral is a startup that happened to raise 100M. I said in my comment a “country backed startup”, which mistral is

vb-8448 · 2026-05-30T20:09:24 1780171764

The "consultancy" is their moat: if there are already in the company they will catch up most of the opportunities despite not having the SOTA model.

In Europe procurement cycles are insane, if you are somehow a "trusted vendor" you get a priority line, otherwise you need a lot of political support or ties with some C-level in a company.

Moreover, a lot of companies don't want to send their data to external providers(unless it's Microsoft, but it's a different story ...)

Oras · 2026-05-29T10:59:54 1780052394

as not custom chips like Grog and Cerebras. Did you expect a single GPU chip to reach 3k tps?

embedding-shape · 2026-05-29T11:10:04 1780053004

I think many would assume "not enterprise" or "not datacenter grade" when someone says "Standard GPUs", but maybe that specific phrase have a specific meaning I'm not familiar with.

Edit: I just tried a 4B model on a RTX Pro 6000, getting ~500 tok/s with llama.cpp not even trying to optimize or change anything, just default settings. I'm sure with vLLM it'd be a lot faster already, still before manually tuning configs. I wouldn't call that card "Standard GPU" either FWIW, but it makes the claimed performance numbers feel not as exciting, especially given the hardware they were using.

ismailmaj · 2026-05-29T11:13:00 1780053180

I expected a 4090, maybe 2. I did not expect 8xH200 for a 2B model.

gaeld · 2026-05-29T11:45:45 1780055145

Great points, let me clarify:

- model size: 2B is just for this preview (it was faster to implement), our article explains how we expect to support large frontier MoE at 1,000 to 5,000 tokens/s

- reaching 500 tok/s, or even up to ~1,000 tok/s, on a consumer GPU card is possible with existing inference engines like vLLM. But there is a ceiling.

The hard part comes we you try to be faster than that: these frameworks won't scale higher just by adding GPUs or using faster GPUs. There is a "glass ceiling" due to microseconds lost everywhere in the stack (grid syncs, inter-GPU comms, kernel launches, CPU sampling, etc.).

All our work at Kog is about removing these bottlenecks.

dr_kiszonka · 2026-05-29T17:15:44 1780074944

Thank you for explaining. Do you think there are still opportunities for stack optimizations to meaningfully speed up inference on single consumer-grade GPUs?

gaeld · 2026-05-29T18:13:32 1780078412

I'm sure there are, and I really hope we can work on consumer-grade GPUs at some point.

It should be possible to apply the same methodology (digging deep into the hardware details to understand all its little characteristics, and rethinking the inference stack around that).

bcjdjsndon · 2026-05-29T14:12:46 1780063966

That doesn't clarify anything lol. It's a bit click baity.

bcjdjsndon · 2026-05-29T14:09:44 1780063784

> Did you expect a single GPU chip to reach 3k tps?

Did the article headline not say Standard GPU?

WithinReason · 2026-05-29T11:56:42 1780055802

so what would be the above-standard GPUs then that they are excluding? Cerebras is not GPU

Oras · 2026-05-29T10:58:51 1780052331

what did you have in mind when you read "Standard GPUs"?

yjftsjthsd-h · 2026-05-29T13:14:18 1780060458

The GPU in my desktop. (A normal-ish decent gaming machine that runs LLMs and txt2img well enough.)

In contrast, not enterprise GPUs that cost as much as a car.

gaeld · 2026-05-29T11:04:03 1780052643

I guessed you thought about consumer GPUs. We are about standard datacenter GPUs indeed.

deflator · 2026-05-29T14:10:25 1780063825

What a lot of use on here are salivating for is the ability to run these on prosumer hardware at home. So we tend to jump to the conclusion that "standard" means "consumer-grade" because that's what we want to see. Still, very cool work!

gaeld · 2026-05-29T15:15:05 1780067705

thank you deflator, I understand this now! much appreciated

selicos · 2026-05-30T02:59:54 1780109994

A consumer "Standard GPU" could mean about a 6-8gb VRAM GPU still in support by the manufacturer, independent of CUDA/etc proprietary technology.

Recent Steam hardware survey top GPU list is:

- RTX 3060 (6 or 12gb VRAM)

- RTX 4060 (8 or 16gb)

- RTX 3050 (6 or 8gb)

- RTX 5070 (12gb)

- RTX 5060 (8gb)

- GTX 1650 (4gb!)

That list only covers about 22% of survey respondents but sets a 6-8gb VRAM baseline for consumer GPUs.

Can this run on an RX 570 8gb form 2017? Maybe that's a ways back. A 1660 6gb from 2019? Intel? They had a decent budget run in recent years.

https://store.steampowered.com/hwsurvey/videocard/

nightski · 2026-05-29T15:20:21 1780068021

How would you classify a datacenter GPU as standard/non-standard? That doesn't seem to be a meaningful distinction. It's click bait.

averne_ · 2026-05-29T16:49:53 1780073393

The blog makes it clear that "standard" GPU here is in opposition to purpose-built hardware like Cerebras. The selling point is reaching the same order of magnitude in generative speed as those approaches.

felooboolooomba · 2026-05-29T20:21:06 1780086066

Certainly not 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200

bcjdjsndon · 2026-05-29T14:13:33 1780064013

You know, Radeon 9800 pro ago