genai-lab
Exposes tools for semantic search and RAG-based Q\&A from a local knowledge base, along with resources and prompt templates.
README
GenAI Lab
GenAI Lab implements a TypeScript-based GenAI agent workflow with semantic search, Retrieval-Augmented Generation (RAG), MCP tools/resources/prompts, and LangGraph-based orchestration.
The project focuses on the technical mechanics behind controlled GenAI systems: retrieval, grounding, structured planning, graph-based routing, bounded retries, and MCP tool exposure.
Technical Capabilities
- Embedding-based semantic search over a local knowledge base
- Retrieval-Augmented Generation with source-grounded responses
- Source citation support using note IDs
- LangGraph workflow orchestration with explicit shared state
- Structured LLM planning with constrained execution steps
- Retrieval-query extraction before semantic search
- Conditional graph routing based on planner output and retrieval state
- Decomposed RAG flow: retrieval node followed by answer/draft generation nodes
- Bounded retry path for low/no retrieval results
- Safe fallback behavior for unsupported or ungrounded requests
- MCP server exposing tools, resources, and prompt templates
- AI SDK MCP client flow for LLM-driven MCP tool selection
System Architecture
Agent workflow
User request
→ planner node
→ conditional routing
→ retrieve context when needed
→ answer question OR draft message
→ retry/fallback when needed
→ final response
MCP flow
LLM client
→ discovers MCP tools
→ chooses search_notes / answer_from_notes
→ MCP server executes backend logic
→ result returns to LLM
→ final response
Setup
Install dependencies:
pnpm install
This project requires an OpenAI API key.
Create .env.local:
OPENAI_API_KEY=your_openai_api_key
Supported Example Flows
| Flow | Path | What it demonstrates | Example command |
|---|---|---|---|
| Grounded Q&A flow | planner → retrieve_notes → answer_question → final_response |
Answers using retrieved context and source citations | pnpm agent "Can you explain why retrieval should happen before generation?" |
| Retrieval-augmented drafting flow | planner → retrieve_notes → draft_message → final_response |
Retrieves relevant context before generating a draft | pnpm agent "Use my notes about token-heavy conversations to write a short team update" |
| Direct drafting flow | planner → draft_message → final_response |
Generates a draft without retrieval when no knowledge lookup is needed | pnpm agent "Draft a Slack message saying QA signoff is pending" |
| Retrieval-only flow | planner → retrieve_notes → final_response |
Searches notes and returns grounded matching context | pnpm agent "Search saved notes for deterministic backend functions" |
| Fallback flow | planner → final_response |
Avoids unsupported answers when no tool/knowledge path applies | pnpm agent "Tell me something funny" |
| Standalone semantic search | query → embedding → similarity ranking → notes |
Runs vector similarity search directly | pnpm semantic-search "backend-defined tools and input schemas" |
| Standalone RAG flow | question → retrieve context → generate answer → cite source |
Runs retrieval and answer generation without LangGraph | pnpm rag "Why do long chats become more expensive?" |
| MCP tool-selection flow | LLM → MCP tools → selected tool → tool result → final answer |
Lets the LLM choose MCP-exposed tools | pnpm mcp:llm "Explain how teams reduce LLM cost in long conversations" |
Quick Demo
pnpm agent "Can you explain why retrieval should happen before generation?"
pnpm agent "Use my notes about token-heavy conversations to write a short team update"
pnpm agent "Draft a Slack message saying QA signoff is pending"
pnpm agent "Search saved notes for deterministic backend functions"
pnpm agent "Tell me something funny"
pnpm semantic-search "backend-defined tools and input schemas"
pnpm rag "Why do long chats become more expensive?"
pnpm mcp:llm "Explain how teams reduce LLM cost in long conversations"
Tech Stack
- TypeScript
- Vercel AI SDK
- OpenAI models via
@ai-sdk/openai - LangGraph
- Model Context Protocol TypeScript SDK
- Zod
- pnpm
Project Structure
lib/
agent/
agent-state.ts
mini-agent.ts
embedding.ts
knowledge-base.ts
retrieve-context.ts
semantic-search.ts
rag-answer.ts
rag-types.ts
vector-utils.ts
mcp/
server.ts
client-test.ts
llm-client-test.ts
scripts/
agent.ts
rag.ts
semantic-search.ts
Implementation Notes
1. Semantic search
The project embeds notes and queries, then ranks notes by vector similarity.
query
→ embedding
→ similarity search
→ ranked notes
2. RAG
The RAG flow retrieves relevant context before generating an answer.
question
→ retrieve context
→ generate grounded answer
→ include source citation
3. LangGraph orchestration
The agent workflow uses LangGraph to keep explicit state across nodes.
Example plans:
retrieve_notes → answer_question → final_response
retrieve_notes → draft_message → final_response
draft_message → final_response
final_response
4. Retrieval-query extraction
The planner extracts a focused retrieval query instead of sending the full user request to semantic search.
Example:
User request:
Use my notes about token-heavy conversations to write a short team update
Retrieval query:
token-heavy conversations
5. Decomposed RAG
RAG is split into retrieval and generation steps inside the agent workflow.
retrieve context
→ answer or draft from retrieved context
→ final response
This keeps retrieval, generation, routing, and fallback behavior visible and independently controllable.
6. Conditional routing
The graph routes based on the current state.
if context found and answer requested → answer_question
if context found and draft requested → draft_message
if no context → retry or fallback
7. Bounded retry
When retrieval fails, the agent retries once with a broader query and lower score threshold.
focused query fails
→ retry with broader query
→ succeed or stop safely
8. Safe fallback
Unsupported or unrelated requests return a bounded fallback instead of generating unsupported answers from missing context.
Example:
pnpm agent "Tell me something funny"
Expected behavior:
returns a bounded fallback instead of answering from unsupported context
MCP Interface
The MCP server exposes:
| MCP Primitive | Name | Purpose |
|---|---|---|
| Tool | search_notes |
Search saved notes |
| Tool | answer_from_notes |
Answer questions from notes |
| Resource | notes://all |
Read local knowledge base |
| Prompt | rag_answer_prompt |
Prompt template for grounded answers |
The LangGraph agent and MCP examples are intentionally kept as separate flows in this repo.
- The LangGraph flow demonstrates controlled agent orchestration with state, planning, routing, retrieval, drafting, retries, and fallback behavior.
- The MCP flow demonstrates how the same search/RAG capabilities can be exposed to external MCP clients as tools, resources, and prompts.
In a larger application, these patterns can be combined by having LangGraph nodes call MCP tools for external capabilities such as GitHub, Jira, Slack, or Confluence.
Knowledge Base
The local knowledge base contains notes about:
- RAG basics
- token cost in chat apps
- workflow routing
- tool calling
- semantic search
The knowledge base is intentionally small so retrieval, ranking, grounding, and routing behavior are easy to inspect.
Design Scope
Current scope:
- local knowledge base
- embedding-based semantic search
- RAG with source citations
- CLI scripts instead of a web UI
- MCP over stdio
- controlled LangGraph workflow
The focus is on the core mechanics of retrieval, grounding, planning, routing, MCP tool exposure, and bounded agent behavior.
Next Improvements
- Improve CLI output formatting
- Add clearer docs for LangGraph state and routing
- Add sample output snapshots
- Add basic tests for retrieval and RAG behavior
TODO
Short Term
- Add more knowledge-base examples
- Add an eval script for expected retrieval results
- Add safer handling for low-confidence retrieval
- Add structured output for final agent responses
Future Extensions
- Database-backed vector search
- Chunking for longer documents
- Richer source metadata
- External integrations
- Guardrails and approval workflows
- Observability logs
- Eval suite
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。