new
V0 Landing release!

Sonoti now has a home on the web. It walks through what we're building, a layer that cuts your LLM bill with prompt compression, semantic caching, and intelligent model routing, plus the observability to prove every dollar saved.
We started on this while working in Meta's Efficiency Org where I saw teams competing for compute capacity and attempting to reduce spending after aggressively pushing AI adoption for the past year.
Companies have aggressively been pushing AI adoption, and the priority has been speed of adoption, not optimization.
Without a unified layer that tracks AI usage and applies optimization methods across the stack, we have seen teams at Meta introduce cost regressions, repeatedly solve for similar cost problems and overuse expensive models. 
With usage growing faster than the available infrastructure can keep up, this is the perfect time to solve for LLM efficiency.
Have a look around, and if it's a fit, become a design partner, we'd love to put it to work on your traffic.