TL;DR
Sarvam AI has introduced two new large language models (LLMs), including a 105B parameter Mixture-of-Experts (MoE) variant trained entirely from scratch. The startup claims its 105B model outperforms global benchmarks like DeepSeek R1 and Gemini Flash in key reasoning and Indian language performance tests.
Vichaarak Perspective: The Scalability and Efficiency Trade-off
Sarvam’s 105B MoE model is a bold signal that Indian AI startups aren't just fine-tuning; they're training at scale. MoE architecture is the right path because it allows for high-capacity reasoning without the massive compute overhead of dense models. But the contrarian question remains: can a $100M-funded startup sustain the capital-intensive compute cycles needed to iterate against trillion-dollar rivals? Sarvam's play is focused on real-time enterprise use and advanced reasoning, suggesting a strategic pivot toward high-value corporate and government contracts rather than consumer chat.
FAQ
What makes Sarvam AI's 105B model special? It uses a Mixture-of-Experts (MoE) architecture, which allows only relevant parts of the model to activate during a query, making it faster and more efficient than traditional dense models.
How does it compare to global models? According to Sarvam, their 105B model outperforms DeepSeek R1 and Gemini Flash on benchmarks relevant to Indian contexts and general reasoning.
Who is behind Sarvam AI? Founded by Vivek Raghavan and Pratyush Kumar, Sarvam AI is backed by Lightspeed, Peak XV, and the Nilekani-backed Fundamentum.
Schema.org Linking
Explore Sarvam AI's Benchmarks India AI Summit Live Coverage