Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Local micro models are both fast and cheap. We tuned small models on our data set and if the small model thinks content is a certain way, we escalate to the LLM.

This gives us really good recall at really low cloud cost and latency.



I'd love to try this on my data set - what approach/tools/models did you use for fine-tuning?


Everything is built in-house unfortunately. Many of our small models are turned Qwen3. But we mostly chose the model on SOTA at the time we needed a model trained.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: