Sahai

Sahai

Enterprise-grade AI support platform that scales customer service

ProductB2B SaaS Platform
Timeline8 Weeks
StatusLive
Stack
Next.js 15TypeScriptConvexOpenAI GPT-4VapiAWS Secrets ManagerClerkTurborepo

The Story

The Problem We Set Out to Solve

Customer support is broken for modern businesses. Companies face an impossible choice: hire expensive support teams that can't scale, or use chatbots that frustrate customers with generic, unhelpful responses. The middle ground — intelligent, context-aware AI that knows when to escalate to humans — didn't exist in an accessible, production-ready form.

The numbers told the story. Support teams spend 60% of their time answering repetitive questions that could be automated, while customers wait hours (or days) for responses. Meanwhile, the AI chatbots flooding the market felt exactly like what they were: scripted decision trees wearing the label "AI."

We built Sahai to solve this: a B2B AI support platform where businesses deploy intelligent agents that understand their documentation, handle conversations naturally, and seamlessly escalate to human operators when needed.

Starting with the Conversation

The project began with a deceptively simple question: what does truly helpful AI support feel like? Not a bot that throws knowledge base articles at you, but one that searches, understands context, and knows its limits.

We spent the first two weeks exclusively on prompt engineering and conversation design. The support agent prompt went through 47 iterations. The breakthrough came when we stopped trying to make the AI sound "professional" and instead focused on making it genuinely useful — knowing when to search, when to escalate, and critically, when to admit it doesn't know something.

The name Sahai (meaning "companion" in Hindi) reflects this philosophy: it's not a replacement for human support, but a partner that handles what it can and brings in humans when it matters.

Technical Architecture: Building for Scale

Sahai is a production-grade monorepo built with Turborepo, containing three core applications:

The Stack

Frontend:

  • Dashboard (Next.js 15) — A real-time operator interface where support teams monitor conversations, intercept chats, and manage their knowledge base
  • Widget (Next.js 15) — Customer-facing chat and voice interface that embeds into any website
  • Embed Script (Vite) — Lightweight JavaScript that initializes the widget with a single line of code

Backend:

  • Convex — Real-time database and backend that powers instant message sync, live conversation updates, and sub-100ms query responses
  • OpenAI GPT-4o-mini — Powers the AI agent with RAG (Retrieval Augmented Generation)
  • Vapi — Handles voice AI for phone and web-based voice support
  • AWS Secrets Manager — Secure, encrypted credential storage for multi-tenant API keys
  • Clerk — Authentication, organization management, and subscription billing

Key Architectural Decisions

1. Multi-Tenant from Day One

Every design decision centered on tenant isolation. Organizations get:

  • Separate namespaces for knowledge bases (RAG entries scoped by org ID)
  • Isolated credential storage (each tenant brings their own Vapi API keys)
  • Independent subscription management via Clerk
  • Complete data separation at the database level

This wasn't an afterthought — it's baked into the schema. Every query filters by organizationId, every conversation is scoped, every file upload writes to an org-specific namespace.

2. RAG Implementation That Actually Works

Most RAG implementations fail because they treat embeddings as magic. Ours succeeds because we focused on the hard parts:

// Each file upload:
1. Extracts text content (PDF, DOCX, TXT, etc.)
2. Generates embeddings using OpenAI's text-embedding-3-large (1536 dimensions)
3. Stores with content hash to prevent duplicates
4. Indexes in a namespace-specific vector database

// When the AI searches:
1. Embeds the user's query
2. Performs vector similarity search (scoped to org namespace)
3. Retrieves top 5 most relevant passages
4. Uses a specialized "interpreter" prompt to synthesize an answer
5. Returns citation metadata for transparency

The magic isn't the embeddings — it's the prompt design. Our SEARCH_INTERPRETER_PROMPT teaches the AI to:

  • Extract specific information from search results
  • Admit when results are vague or missing
  • Proactively offer human support when it can't help
  • Never fabricate information not found in search results

3. Smart Escalation System

The AI has three tools at its disposal:

searchTool — Queries the knowledge base for information. Called automatically for any product/service question.

escalateConversationTool — Transfers conversation to a human operator. Triggered when:

  • Customer explicitly asks for a human
  • AI detects frustration in language
  • Search returns no relevant results
  • Issue requires judgment beyond documentation

resolveConversationTool — Marks conversation complete. Called when customer confirms their issue is resolved.

The conversation flow is encoded in the system prompt with specific trigger phrases and escalation criteria. This isn't hard-coded logic — it's learned behavior from a carefully crafted prompt that embodies support best practices.

4. Real-Time Operator Dashboard

The operator experience was as critical as the customer experience. We built a dashboard that feels like Intercom or Front, but with AI augmentation:

  • Live conversation list with real-time updates (Convex's reactive queries make this trivial)
  • Rich customer context: location (via timezone), device info, referrer URL, browser details
  • Conversation status workflow: Unresolved → Escalated → Resolved
  • AI-enhanced responses: operators can click "Enhance" to improve grammar, tone, and clarity
  • Infinite scroll on both conversations and messages (pagination with cursor-based loading)
  • Message interpretation: operators see who sent what (user vs AI vs operator)

The enhance feature is subtle but powerful. Operators type quickly under pressure. The AI cleans up typos, improves clarity, and ensures consistent tone — all while preserving the operator's intent.

The Hardest Parts

1. Secure Multi-Tenant Credential Management

Early on, we hit a wall: how do businesses bring their own Vapi API keys securely?

Storing them in the database was a non-starter (breach risk). Using environment variables didn't scale (one key per tenant? impossible). The solution: AWS Secrets Manager.

Each tenant's credentials are:

  • Encrypted at rest by AWS
  • Retrieved only when needed
  • Never exposed to the frontend
  • Scoped to the organization making the request

The implementation is elegant:

// When a tenant connects Vapi:
1. Operator enters public and private API keys
2. Keys are encrypted and stored in AWS Secrets Manager
3. A reference to the secret is saved in the database
4. When the widget needs to make a voice call, backend retrieves keys server-side
5. Keys are used to authenticate with Vapi, then discarded

This "Bring Your Own Key" (BYOK) model was non-negotiable for enterprise customers. We couldn't ask them to trust us with their API keys unless we could prove they were as secure as if they stored them themselves.

2. Embeddable Widget That Works Everywhere

The widget needed to work on any website — React apps, WordPress blogs, vanilla HTML pages — without conflicts or dependencies.

The solution: a self-contained embed script that:

  • Loads in an isolated iframe to prevent CSS conflicts
  • Communicates via postMessage for security
  • Handles microphone permissions for voice calls
  • Self-destructs on navigation (no memory leaks)
  • Exposes a tiny API (SahaiWidget.show(), .hide(), .destroy())

The embed code is literally one line:

<script src="https://embed.sahai.com/embed.js" 
        data-organization-id="org_123"></script>

Getting this to work reliably across browsers, with proper permissions handling, and maintaining performance took longer than building the entire backend.

3. Voice AI Integration Without Breaking the Bank

Voice AI is expensive. Every minute of Vapi usage costs money. We needed to:

  • Only initialize voice calls when explicitly requested (not eagerly loaded)
  • Stream transcripts in real-time so users see what's happening
  • Handle call state management (connecting, speaking, listening, ended)
  • Support both web calls and phone numbers

The Vapi integration taught us that integrating third-party AI services is 20% API calls and 80% error handling, state management, and UX polish. The hardest part wasn't making a call — it was making the calling experience feel native and reliable.

Subscription Architecture

We integrated Clerk's billing system with a subscription enforcement layer:

  • Free tier: Basic access, limited team size
  • Pro tier: Unlocks AI enhancements, file uploads, voice calls, larger teams

Enforcement happens at the backend:

// Before expensive operations:
1. Check if organization has active subscription
2. Return "subscription required" error if inactive
3. Frontend shows upgrade prompts for premium features

The webhook handler listens for Clerk subscription events and updates team size limits automatically:

subscription.status === "active" 
  ? maxAllowedMemberships = 100 
  : maxAllowedMemberships = 1

This tight integration means businesses can upgrade and immediately access features — no manual provisioning, no delays.

What We Learned

1. Multi-Tenancy is 10x Harder Than You Think

Every feature required asking: "How does this work across organizations?" Database indexes, API authentication, file uploads, conversation routing — all needed org-scoping. We caught bugs where one tenant could theoretically access another's data. Zero incidents shipped, but only because we paranoidly tested every query.

2. RAG is More Prompt Engineering Than ML

The embeddings model is off-the-shelf OpenAI. The vector search is provided by Convex RAG. The actual intelligence is in the prompts:

  • How you instruct the AI to search
  • How you teach it to interpret results
  • When you tell it to admit ignorance

We spent more time on the SEARCH_INTERPRETER_PROMPT than on the entire embedding pipeline.

3. Real-Time Makes Everything Better

Using Convex's reactive queries transformed the operator experience. When a customer sends a message, operators see it instantly. When the AI escalates, the conversation jumps to "Escalated" view immediately. There's no polling, no refresh button — it just works. This real-time quality is the difference between a product that feels sluggish and one that feels alive.

4. Security is a Feature, Not a Checkbox

AWS Secrets Manager, Clerk authentication, content security policies, CORS configuration, input sanitization — these aren't "nice to have" in B2B SaaS. They're table stakes. Enterprises won't adopt your product unless you can prove you take security seriously. We spent 20% of development time on security alone.

The Result

Sahai is a production-grade B2B AI support platform that businesses can deploy in under 10 minutes. It combines:

  • Intelligent AI that knows when to help and when to escalate
  • Real-time operator tools that make human support 10x more efficient
  • Enterprise security with encrypted credentials and multi-tenant isolation
  • Voice + chat in a single unified interface
  • Knowledge base RAG that learns from documentation
  • One-line integration that works on any website

But more than the features, Sahai represents a philosophy: AI should augment human support, not replace it. The best support experience is one where AI handles the routine so humans can focus on the complex, nuanced, high-empathy interactions where they excel.

We built Sahai because we believe the future of customer support is collaborative — humans and AI working together, each doing what they do best.


Tech Stack Breakdown:

  • Frontend: Next.js 15, TypeScript, TailwindCSS, shadcn/ui
  • Backend: Convex (real-time DB + serverless functions)
  • AI: OpenAI GPT-4o-mini, OpenAI Embeddings (text-embedding-3-large)
  • Voice: Vapi AI
  • Auth: Clerk (organizations + billing)
  • Security: AWS Secrets Manager
  • Deployment: Vercel (web + widget), Vite (embed script)
  • Monitoring: Sentry
  • Monorepo: Turborepo + pnpm

Key Metrics:

  • Sub-100ms query response times (Convex)
  • 24-hour contact session persistence
  • 1536-dimension embeddings for RAG
  • Vector search with top-5 result retrieval
  • Real-time message streaming
  • Infinite scroll pagination (10 items per load)
  • Organization-scoped namespaces
  • Encrypted credential storage