Demystifying Kruti

India’s technological landscape is as diverse as its people with over 1.4 billion individuals speaking more than 19,500 languages and dialects, navigating a mix of urban innovation and rural traditions. Yet, this diversity comes with challenges: language barriers, limited access to intuitive technology, and a need for solutions that reflect India’s cultural and economic realities. Kruti, launched by Ola Krutrim, steps into this space as a transformative AI assistant designed to meet these challenges head-on.
Kruti isn’t just another conversational AI, it’s a multimodal, transactional, and culturally attuned platform that redefines how Indians interact with technology. From enabling a farmer in Tamil Nadu to check crop prices via voice commands to helping a Mumbai professional book a cab and order lunch in one seamless flow, Kruti bridges the gap between complex needs and simple solutions. Its purpose is clear: to empower users by making technology accessible, efficient, and deeply integrated into daily life.
In this first part of our three-part series, we’ll unpack Kruti’s core features in exhaustive detail: its interaction modes, specialised agents, multimodal capabilities, and what sets it apart from the competition. Whether you’re a user curious about its capabilities or a tech enthusiast eager for insights, this comprehensive overview will leave no stone unturned.
Primary Interaction Modes
Kruti’s design revolves around flexibility, offering three distinct interaction modes to cater to a spectrum of user needs. Each mode is optimised for specific scenarios, ensuring that Kruti is both a quick helper and a deep problem-solver.
Auto Mode: Speed and Simplicity
Auto Mode is Kruti’s answer to fast-paced, on-the-go interactions. It’s built for users who need instant responses without the fluff, think of it as your personal efficiency assistant. Here’s how it works:
- Purpose: Handles straightforward queries and tasks with minimal back-and-forth.
- Use Cases:
- Weather Check: "What’s the temperature in Delhi today?" Kruti responds: "It’s 32°C with clear skies."
- Match and Current Trends: “Tell me the final standings of IPL 2025” Kruti responds with summary, final standings table and winner.
- Technical Insight: Auto Mode leverages LLMs models for rapid intent recognition, prioritizing speed over depth. It’s ideal for single-turn interactions where context isn’t critical.
In-depth Mode: Knowledge and Exploration
When users need more than a quick answer, In-depth Mode steps in. This mode is designed for research, learning, or tackling multi-layered questions, delivering detailed and contextual responses.
- Purpose: Provides comprehensive explanations or step-by-step guidance.
- Use Cases:
- Research: "Tell me about the history of the Indian railway system." Kruti offers a multi-paragraph response, covering its colonial origins, post-independence growth, and modern electrification efforts.
- Drafting: "Help me create a report on Chanakya’s arthashastra and how it applies in today’s global economy." Kruti generates a polished draft:
- Learning: "Explain how solar panels work." Kruti breaks it down: from photovoltaic cells to energy conversion, with tables and diagrams described in text.
- Technical Insight: In-depth Mode taps into complex multi-step reasoning or coherence and factual accuracy, drawing from a vast knowledge base tailored to Indian contexts.
Agents Mode: Task Delegation and Automation
Agents Mode is Kruti’s crown jewel, a hands-off approach where users delegate entire tasks to specialised AI agents. This mode transforms Kruti from a responder into a doer.
- Purpose: Executes end-to-end workflows, from planning to completion.
- Use Cases:
- Cab Booking: "Get me a cab to the airport." The Cab Booking Agent finds options, books the ride, and tracks it, all without further input.
- Food Ordering: "Order biryani for dinner." The Food Ordering Agent suggests restaurants, confirms your choice, and places the order.
- Bill Payments: "Pay my phone bill." The Bill Payments Agent retrieves the amount, processes payment, and sends a confirmation.
- Technical Insight: Agents Mode relies on an agentic architecture (explored in Part 2), where modular agents collaborate via APIs and internal protocols to manage complex tasks.
These modes aren’t rigid silos, Kruti intelligently switches between them based on user intent, ensuring a fluid experience. For example, a query like "What’s the weather like, and book a cab if it’s raining" triggers Auto Mode for the weather check and Agents Mode for the cab booking.
Detail on the Agents
Kruti’s agents are specialised AI entities, each a master of its domain. Think of them as a team of digital experts working together to solve your problems. Below, we dive into their capabilities, integration with external services, and real-world applications.
Food Ordering Agent
- Capabilities: Integrates with platforms like ONDC to streamline food orders.
- Features:
- Personalised Suggestions: Analyses past orders and preferences (e.g., "You love spicy food, try this Andhra biryani!").
- Menu Navigation: Provides detailed options: "Restaurant Rajbhog offers malai kofta for ₹250 or kadhi for ₹180."
- Order Execution: Places orders and tracks delivery status in real-time.
- Example Scenario: A busy parent says, "Order dinner for four, something quick and vegetarian." Kruti suggests nearby options, confirms a paneer pizza order, and notifies them when it’s 10 minutes away.
- Integration: Uses APIs to connect with delivery platforms, ensuring secure payment processing and live updates.
Cab Booking Agent
- Capabilities: Links with Ola cabs, offering a seamless transport solution.
- Features:
- Option Comparison: "Ola sedan: ₹500, 5 mins away; Ola Mini 7 mins away."
- Booking and Tracking: Confirms the ride and provides driver details and ETA.
- Preference Learning: Remembers you prefer AC cabs or budget rides.
- Example Scenario: "Book a cab to the airport at 7 PM." Kruti schedules it, sends details, and tracks the cab’s arrival.
- Integration: Employs Ola cabs and Ola Maps MCP (Model Context Protocol) to standardize communication with transport APIs.
Bill Payments Agent
- Capabilities: Manages utility and service payments with precision.
- Features:
- Bill Retrieval: Fetches amounts and due dates (e.g., "Your electricity bill is ₹1,200, due tomorrow").
- Payment Processing: Uses UPI or saved cards to complete transactions.
- Reminders: Alerts you: "Your water bill is due in 2 days, pay now?"
- Example Scenario: "Pay my internet bill." Kruti confirms the ₹800 payment and sends a receipt via WhatsApp.
- Integration: Connects to banking and utility APIs, ensuring end-to-end encryption for security.
Image Generation Agent
- Capabilities: Generates India-centric images
- Features:
- Generate images
- Edit images
- Add custom Kruti style
- Example Scenario: "Generate an image of kids playing on the street in Kruti style”
Detail on Multimodal Capabilities
Kruti’s ability to process text, images, and speech makes it a versatile companion. This section explores the technology behind each input type and provides practical examples.
Text Processing
- Technology: Powered by advanced LLMs (Krutrim-2 and state of the art open source models), Kruti excels at understanding and generating text in 10 Indic languages, plus English.
- Features:
- Intent Recognition: Parses queries like "Order dosa" to trigger the Food Ordering Agent.
- Context Awareness: Remembers prior interactions within a session (e.g., "Make it quick" after asking about food).
- Regional Adaptation: Handles slang and dialects, e.g., "Bhai, kitna time lagega?" in Hindi prompts a casual, accurate reply.
- Example: A user in Kerala types in Malayalam: "എനിക്ക് ഒരു ടാക്സി വേണം" ("I need a taxi"). Kruti books it and responds in Malayalam.
Image Processing
- Technology: Employs Chitrarth and state of the art VLMs for image recognition and analysis.
- Features:
- Document Scanning: Extracts text from bills, IDs, or receipts.
- Object Identification: Recognises items or landmarks in photos.
- Contextual Analysis: Pairs image data with user queries for richer responses.
- Example: "What’s this bill?" [Uploads electricity bill photo]. Kruti extracts: "Amount: ₹1,500, Due: Oct 15," and offers to pay it.
- Use Case: A tourist snaps a photo of the Gateway of India, asking, "Tell me about this." Kruti identifies it and provides a detailed history.
Speech Processing
- Technology: Combines Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models trained on Indian accents and languages.
- Features:
- Multilingual Support: Understands multiple Indic languages
- Natural Responses: Generates human-like audio replies via TTS.
- Noise Robustness: Filters background noise for clarity in crowded settings.
- Example: A user in a bustling market says, "Book a cab home" in Telugu. Kruti processes it, books the ride, and replies audibly: "Cab confirmed, arriving in 8 minutes."
- Accessibility: Vital for semi-literate users or those multitasking, e.g., a driver asking for directions hands-free.
Kruti’s multimodal system integrates these inputs seamlessly. For instance, a user might say, "Pay this bill," upload a photo, and type a follow-up, all handled in one conversation.
Points of Differentiation
Kruti stands out in Assistant space. Here’s how it compares to global players:
- End-to-End Transactional Power:
- Kruti: Completes tasks like ordering food or booking travel.
- Others: Primarily informational (e.g., ChatGPT explains recipes but can’t order them).
- India-Centric Design:
- Kruti: Supports Indic languages, understands cultural nuances (e.g., festival-specific queries), and integrates with local services.
- Others: Often English-centric with limited regional relevance.
- Agentic Architecture:
- Kruti: Modular agents collaborate for complex workflows.
- Others: Rely on single-threaded models or static APIs.
- Multimodal Mastery:
- Kruti: Processes text, images, and speech cohesively.
- Others: Typically text-only or limited in multimodal scope.
Models and Architecture
Kruti leverages multiple models: Krutrim-2, Chitrath VLM, open source models like LLAMA, DeepSeek, Qwen, and proprietary models like Claude, GPT4o and more to serve a truly agentic experience with multimodal and multilingual use cases.
Conclusion
Kruti reimagines AI assistance with its intuitive modes, powerful agents, and multimodal prowess, all tailored to India’s needs. Whether you’re a student, professional, or small business owner, Kruti offers tools to simplify and enrich your life.
But what powers this innovation? In Part 2, we’ll dive deep into Kruti’s technical architecture, its agentic platform, AI models, and task-breaking mechanisms. Get ready for a behind-the-scenes look at how Kruti thinks, learns, and acts!