Local micro models are both fast and cheap. We tuned small models on our data set and if the small model thinks content is a certain way, we escalate to the LLM.
This gives us really good recall at really low cloud cost and latency.
Everything is built in-house unfortunately. Many of our small models are turned Qwen3. But we mostly chose the model on SOTA at the time we needed a model trained.
This gives us really good recall at really low cloud cost and latency.