Overview
For companies maintaining software connector catalogs—JDBC, SaaS APIs, CRMs, ERPs—technical documentation is essential. Static documentation fails when users need fast, contextual answers. A Retrieval-Augmented Generation (RAG) system solves this problem.
This guide walks through design, challenges, and strategies to scale an assistant across 100+ connectors.

Step 1: Foundational RAG System—Chunk, Embed, Retrieve
What Was Built
1. Semantic Chunking
- Parsed HTML documentation into semantically meaningful units (method-level, endpoint-level, parameter blocks, examples)
- Avoided fixed-size chunking as it broke structure and context
- Each chunk included metadata: connectorName, docSection, filePath
2. Embedding
- Used transformer-based models (OpenAI, Cohere, etc.) to generate vector embeddings
- Focused on models preserving code and instruction format sensitivity
3. Vector Store
- Stored embeddings in scalable store allowing filtering by connector or section
- Used hybrid search (vector + keyword filters) to improve retrieval precision
4. Query + Retrieval + Prompt + LLM Response
- Retrieved top-k semantically similar chunks for every question
- Prompted LLM with those chunks plus the question
The Challenge
- Failed on multi-turn conversations
- Couldn’t disambiguate connectors with aliases or abbreviations
- Didn’t include related context if in adjacent chunks
The system evolved from “just retrieval” to “structured understanding.”
Step 2: Add Session Context Awareness
Users often ask follow-up questions without repeating the connector or topic:
- “What does the batchSize parameter do?”
- “How do I authenticate?”
- “What authentication methods are supported?”
These are meaningless without understanding which connector or method the user references.
What Was Built
1. SessionContext Engine
- Tracked last discussed connector, topic, and user intent
- Classified every question as: NEW_TOPIC, FOLLOW_UP, or CLARIFICATION
2. Session Metadata Storage
{
"id": "033198ae-23fd-43dc-8abe-8ed1e0343149",
"connector_type": "Quickbooks",
"last_topic": "Authentication"
}3. Query Routing
Each incoming query routed through a classifier (using rules and LLM calls) to determine if it needed previous context reuse.
Result
- Enabled multi-turn understanding
- Removed burden from users to restate queries every time
- Evolved RAG into true conversational interface
Step 3: Include Neighbouring Chunks for Structural Context
Documentation isn’t flat. Relevant answers often span multiple sections:
- Chunk A: Function Signature
- Chunk B: Parameters
- Chunk C: Notes or Warnings
- Chunk D: Example Usage
Problem: RAG systems usually fetch single best-matching chunk, which is brittle.
What Was Built
1. Linked Chunk Metadata
Each chunk stored previousChunkId and nextChunkId
2. Intelligent Neighbour Retrieval
- Fetched top-k chunks plus immediate neighbours
- Added logic avoiding duplicate or structurally irrelevant neighbors
- Neighbors weighted slightly less than primary match
3. Chunk Types
Introduced types like EXAMPLE, ERROR, REFERENCE, PARAMS to include only relevant neighbours
Result
- Created richer context window for LLM
- Answered questions like “What are required fields for this endpoint?” even spanning multiple sections
Step 4: Resolve Connector Aliases and Abbreviations
Users don’t type full names:
- “QB account pull?” → “QuickBooks”
- “Sf refresh?” → “Salesforce”
This caused false negatives in retrieval and context tracking.
What Was Built: Two-layer Identification Mechanism
1. Deterministic Matching
- Built list of common aliases (QB → QuickBooks)
- Used lowercased string matching, fuzzy ratios, anagram token logic
2. LLM Fallback Resolution
- If first layer failed, asked LLM which known connector user references
- Revalidated against connector list avoiding hallucinations
Result
- Resolved 99% of ambiguous connector names
- Greatly improved first-hit answer accuracy
Step 5: Query Expansion to Improve Recall
Users phrase questions differently than documentation:
“How do I list all errors?” vs. “Supported error codes”
What Was Built
For every query, generated 5–7 semantic variants using LLM to create paraphrases, synonyms, and intent restatements:
Original: "How to list supported errors?" Expanded: - "Error codes returned by the connector" - "What are the possible failure codes?" - "List of exceptions and meanings"
Process
- Performed vector retrieval for each variant
- Merged and deduplicated results
- Reranked by semantic similarity and documentation coverage
Result
- Massive improvement in retrieval coverage
- Enabled answers for flexible, user-style language without requiring documentation term adaptation
Outcomes
After implementing this layered design:
- Answer accuracy improved from ~45% to over 90%
- Connector name resolution accuracy hit 99%
- Smooth multi-turn conversations with no connector repetition required
- Designed for easy onboarding of new connectors
- Scaled across 100+ connectors, each with thousands of documentation lines
Lessons for Connector Companies
If your company maintains multiple technical connectors:
- Start with retrieval—but design for memory. Multi-turn context is non-negotiable.
- Don’t trust user input at face value. Resolve connector names, typos, and abbreviations.
- Docs are structured—respect that. Neighbour chunks matter more than you think.
- Language is flexible. Your system should be too. Synonym and paraphrase handling is critical for open-ended queries.
Final Thoughts
Static documentation is necessary but insufficient. A RAG system built with structure, memory, and intelligence bridges the gap between product complexity and user simplicity.
For connector ecosystems—where each product has its own nuances—this layered approach works and scales.