Buy a few GPUs so you can skip the Claude subscription, and never have to worry ...

weitendorf · 2026-06-07T01:14:45 1780794885

It’s not cheaper to run Claude in your own GPUs rather than the $200/mo for certain workloads. For a large portion of what I work on, the bottleneck is my time, not tokens. You certainly could throw more tokens at it but if you need it to work a certain way for certain reasons, and your plan/goals are beyond the scope of what the top-capability models can do, then throwing them at the problem just bogs you down in extra cruft or reviews/iteration that you could more effectively do being the primary driver of the work.

lrvick · 2026-06-08T00:07:32 1780877252

Sure, you can keep paying $200/mo to Anthropic forever, and accept heavy censorship on the types of tasks you can do (e.g. malware research), accept no privacy, and accept rate limiting and the requirement of internet access at all times.

Or buy $2400 of GPU today to get you something close to get you within 10% of Opus 4.6 on coding benchmarks, that pays for itself in 1 year, AND you can work with private code and data offline as you like with no censorship or restrictions.

The value proposition of Anthropic is comically bad to anyone that understands how to insert PCI-E cards into a motherboard and install linux.

momojo · 2026-06-06T07:22:28 1780730548

How long has this been your daily driver? How has this setup worked for you compared to enterprise models? Which models?

lrvick · 2026-06-06T08:25:56 1780734356

Maybe 2 months. I have mostly used the Qwen series, and currently running Qwen3.6 27B for programming and debugging and Qwen3.6 35B for speed and research. Both punch way way above their weight and replaced Qwen3.5 122B for me. Qwen 3.6 27B even is, for my workloads, preferable over Big Pickle (GLM-4.6) which is the only large third party model I have used extensively for reference and comparison as it is free and requires no signup or PII via OpenCode. My go to agent solution though is Charm Crush.

toomuchtodo · 2026-06-06T19:38:32 1780774712

Do you have a write up available on your build? A friend is looking for a similar solution where they can offer an API/service for internal use.

lrvick · 2026-06-07T23:51:51 1780876311

Not really much to write up.

Insert 2-4 $1200 r9700 GPUs in a Linux 7.0.0+ machine with 64GB+ of DDR4-5 memory, fire up llama.cpp, and connect with any OpenAI compatible tools.

A free public anonymous LLM like BigPickle can easily set up the software for you if in doubt.

throw1234567891 · 2026-06-06T15:36:01 1780760161

You are not shipping all your intellectual property to a third party. There’s nothing more valuable than that.

Frannky · 2026-06-07T03:01:54 1780801314

Yeah, the point was mostly that you can offload a lot of stuff to AI + code — stuff that before you would have needed people for.

Obviously, it becomes better to have local models running on your own hardware — that will be best. I don't think we are there yet, though. Software, yes. If you tweak Pi and DeepSeek Pro, you can get Claude-code-level stuff. You'd still need to buy the hardware, though. Not cheap. Eventually, it will get very cheap.

sharts · 2026-06-06T23:18:39 1780787919

How do the self hosted models compare in terms of foundational ones? Say comparing to opus 4.8 1M, etc?

lrvick · 2026-06-08T00:02:34 1780876954

Qwen3.6 27b is ~10% worse than Opus 4.6 to be fair (though at a fraction of the size), but in exchange you get to run offline with complete privacy, no rate limiting, no refusals from any task, be it malware research or otherwise. Also my favorite reason: controlling the means of production.

Those are all well worth being a month behind frontier models.