Building a High-Accuracy RAG Assistant for Connector Documentation

Kunal Shrivastava • Jun 23, 2025 • 15 min read

Overview

For companies maintaining software connector catalogs—JDBC, SaaS APIs, CRMs, ERPs—technical documentation is essential. Static documentation fails when users need fast, contextual answers. A Retrieval-Augmented Generation (RAG) system solves this problem.

This guide walks through design, challenges, and strategies to scale an assistant across 100+ connectors.

Step 1: Foundational RAG System—Chunk, Embed, Retrieve

What Was Built

1. Semantic Chunking

Parsed HTML documentation into semantically meaningful units (method-level, endpoint-level, parameter blocks, examples)
Avoided fixed-size chunking as it broke structure and context
Each chunk included metadata: connectorName, docSection, filePath

2. Embedding

Used transformer-based models (OpenAI, Cohere, etc.) to generate vector embeddings
Focused on models preserving code and instruction format sensitivity

3. Vector Store

Stored embeddings in scalable store allowing filtering by connector or section
Used hybrid search (vector + keyword filters) to improve retrieval precision

4. Query + Retrieval + Prompt + LLM Response

Retrieved top-k semantically similar chunks for every question
Prompted LLM with those chunks plus the question

The Challenge

Failed on multi-turn conversations
Couldn’t disambiguate connectors with aliases or abbreviations
Didn’t include related context if in adjacent chunks

The system evolved from “just retrieval” to “structured understanding.”

Step 2: Add Session Context Awareness

Users often ask follow-up questions without repeating the connector or topic:

“What does the batchSize parameter do?”
“How do I authenticate?”
“What authentication methods are supported?”

These are meaningless without understanding which connector or method the user references.

What Was Built

1. SessionContext Engine

Tracked last discussed connector, topic, and user intent
Classified every question as: NEW_TOPIC, FOLLOW_UP, or CLARIFICATION

2. Session Metadata Storage

{
    "id": "033198ae-23fd-43dc-8abe-8ed1e0343149",
    "connector_type": "Quickbooks",
    "last_topic": "Authentication"
}

3. Query Routing

Each incoming query routed through a classifier (using rules and LLM calls) to determine if it needed previous context reuse.

Result

Enabled multi-turn understanding
Removed burden from users to restate queries every time
Evolved RAG into true conversational interface

Step 3: Include Neighbouring Chunks for Structural Context

Documentation isn’t flat. Relevant answers often span multiple sections:

Chunk A: Function Signature
Chunk B: Parameters
Chunk C: Notes or Warnings
Chunk D: Example Usage

Problem: RAG systems usually fetch single best-matching chunk, which is brittle.

What Was Built

1. Linked Chunk Metadata

Each chunk stored previousChunkId and nextChunkId

2. Intelligent Neighbour Retrieval

Fetched top-k chunks plus immediate neighbours
Added logic avoiding duplicate or structurally irrelevant neighbors
Neighbors weighted slightly less than primary match

3. Chunk Types

Introduced types like EXAMPLE, ERROR, REFERENCE, PARAMS to include only relevant neighbours

Result

Created richer context window for LLM
Answered questions like “What are required fields for this endpoint?” even spanning multiple sections

Step 4: Resolve Connector Aliases and Abbreviations

Users don’t type full names:

“QB account pull?” → “QuickBooks”
“Sf refresh?” → “Salesforce”

This caused false negatives in retrieval and context tracking.

What Was Built: Two-layer Identification Mechanism

1. Deterministic Matching

Built list of common aliases (QB → QuickBooks)
Used lowercased string matching, fuzzy ratios, anagram token logic

2. LLM Fallback Resolution

If first layer failed, asked LLM which known connector user references
Revalidated against connector list avoiding hallucinations

Result

Resolved 99% of ambiguous connector names
Greatly improved first-hit answer accuracy

Step 5: Query Expansion to Improve Recall

Users phrase questions differently than documentation:

“How do I list all errors?” vs. “Supported error codes”

What Was Built

For every query, generated 5–7 semantic variants using LLM to create paraphrases, synonyms, and intent restatements:

Original: "How to list supported errors?"
Expanded:
  - "Error codes returned by the connector"
  - "What are the possible failure codes?"
  - "List of exceptions and meanings"

Process

Performed vector retrieval for each variant
Merged and deduplicated results
Reranked by semantic similarity and documentation coverage

Result

Massive improvement in retrieval coverage
Enabled answers for flexible, user-style language without requiring documentation term adaptation

Outcomes

After implementing this layered design:

Answer accuracy improved from ~45% to over 90%
Connector name resolution accuracy hit 99%
Smooth multi-turn conversations with no connector repetition required
Designed for easy onboarding of new connectors
Scaled across 100+ connectors, each with thousands of documentation lines

Lessons for Connector Companies

If your company maintains multiple technical connectors:

Start with retrieval—but design for memory. Multi-turn context is non-negotiable.
Don’t trust user input at face value. Resolve connector names, typos, and abbreviations.
Docs are structured—respect that. Neighbour chunks matter more than you think.
Language is flexible. Your system should be too. Synonym and paraphrase handling is critical for open-ended queries.

Final Thoughts

Static documentation is necessary but insufficient. A RAG system built with structure, memory, and intelligence bridges the gap between product complexity and user simplicity.

For connector ecosystems—where each product has its own nuances—this layered approach works and scales.

Build With Us

Interested in building similar solutions for your organization? Let's discuss how we can help.

Get in Touch