Production agentic AI for Indian accounting
We built a six-layer hallucination defence, hybrid retrieval (vector + BM25 + Voyage rerank), and a multi-tenant cost-capped agent runtime — for a domain where wrong answers cost real money. End-to-end hallucination rate measured at ~0.3%.
Six-layer hallucination defence
Standard prompt-engineering tricks (RAG, temp=0, schema validation) miss too many cases when answers cost real money. We layered them:
- Retrieval grounding — every tax claim must come from an indexed government regulation chunk; the agent refuses if no chunk above similarity 0.5 is found
- Schema-constrained output — Zod → JSON Schema enforced via Anthropic structured outputs; the model cannot return
gst_rate: "eighteen percent" - Referential probes — every tool call’s factual claims (HSN code, GST rate, GSTIN format) checked against a master table BEFORE execution; failure → self-correct loop
- Idempotent writes — every mutating tool carries a client-supplied
idempotency_key; double-firing is a silent no-op - Append-only audit log — hash-chained
audit_logtable with chain-of-thought capture; replayable for any conversation - Bounded cost ceiling — per-business monthly cap (default ₹500); model router downgrades Opus → Sonnet → Haiku as budget tightens; hard step-cap on agent loops
Hybrid retrieval pipeline
Pure vector retrieval misses queries with rare proper nouns (Section 269ST, specific HSN codes). Pure BM25 misses paraphrases. We combine both with Reciprocal Rank Fusion + Voyage rerank.
User query
│
├──► OpenAI text-embedding-3-small (1536-d)
│ │
│ ▼
│ pgvector HNSW → top-30 vector candidates
│
└──► PG tsvector ts_rank → top-30 BM25 candidates
│
▼
RRF fusion (k=60)
│
▼
Voyage rerank-2.5-lite
│
▼
top-5 to LLM with citationsMeasured on a 30-query Indian-tax benchmark: recall@5 = 0.71 (vector only) → 0.85 (with rerank) → 0.91 (hybrid + rerank). MRR climbed from 0.57 → 0.81.
Open source
We extracted the production-grade primitives into two npm packages so other teams building agentic AI can build on the same substrate:
@ongravy/agent-kit
RRF · eval metrics · model router · prompt-cache planner · MCP format converter. Zero runtime deps.
npm install @ongravy/agent-kit@ongravy/mcp-server
MCP server exposing 60+ accounting tools to Claude Desktop, Cursor, Zed, and other MCP clients.
npm install -g @ongravy/mcp-serverIndianTaxAgentBench
We are building the first public benchmark for AI agents on Indian tax & compliance tasks: 312 hand-labelled queries across closed-form factual recall, multi-hop reasoning, and tool-using execution. Open-source dataset + eval harness. Paper draft on arXiv soon.
Building this together
OnGravy is led by Pratik Revankar. Solo-built since early 2025; launching June 2026 with a design-partner program for chartered accountants.
If you are a CA interested in the design-partner program, or an engineer interested in working on production agentic AI: revankarpratik1995@gmail.com.