arXiv MCP Server
Enables searching, downloading, and managing academic papers from arXiv.org through natural language interactions. Provides tools for paper discovery, PDF downloads, and local paper collection management.
README
arXiv CLI & MCP Server
A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration.
CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options.
Features
- Search arXiv papers by title, author, abstract, category, and more
- Download PDFs automatically with local caching
- MCP Server for integration with LLM assistants (Claude Desktop, etc.)
- Typed responses using Pydantic models for clean data handling
- Rate limiting built-in to respect arXiv API guidelines
- Comprehensive tests with 26 integration tests (no mocking)
Installation
Option 1: Install from GitHub (Recommended)
Install directly from the GitHub repository:
# Install the latest version
uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git
# Or with pip
pip install git+https://github.com/LiamConnell/arxiv_for_agents.git
# Now you can use the arxiv command
arxiv --help
Option 2: Install from Source
Clone the repository and install locally:
# Clone the repository
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
# Install in editable mode
uv pip install -e .
# Now you can use the arxiv command
arxiv --help
Option 3: Development Installation
For development with all dependencies:
# Clone and install with dev dependencies
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
uv pip install -e ".[dev]"
# Run tests
uv run pytest
Verify Installation
# If installed as package
arxiv --help
# Or if using as module
uv run python -m arxiv --help
Usage
Note: If you installed as a package, use arxiv directly. Otherwise, use uv run python -m arxiv.
Search Papers
Search by title:
# Using installed package
arxiv search "ti:attention is all you need"
# Or using as module
uv run python -m arxiv search "ti:attention is all you need"
Search by author:
arxiv search "au:Hinton" --max-results 20
Search by category:
arxiv search "cat:cs.AI" --max-results 10
Combined search:
arxiv search "ti:transformer AND au:Vaswani"
Get Specific Paper
Get paper metadata and download PDF:
arxiv get 1706.03762
Get metadata only (no download):
arxiv get 1706.03762 --no-download
Force re-download:
arxiv get 1706.03762 --force
Download PDF
Download just the PDF:
arxiv download 1706.03762
List Downloaded PDFs
arxiv list-downloads
JSON Output
Get results as JSON for scripting:
arxiv search "ti:neural" --json
arxiv get 1706.03762 --json --no-download
Search Query Syntax
The arXiv API supports field-specific searches:
ti:- Titleau:- Authorabs:- Abstractcat:- Category (e.g., cs.AI, cs.LG)all:- All fields (default)
You can combine searches with AND, OR, and ANDNOT:
arxiv search "ti:neural AND cat:cs.LG"
arxiv search "au:Hinton OR au:Bengio"
Download Directory
PDFs are downloaded to ./.arxiv by default. Change this with:
arxiv --download-dir ./papers search "ti:transformer"
MCP Server (Model Context Protocol)
The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically.
Running the MCP Server
# Option 1: Using the script entry point (recommended)
uv run arxiv-mcp
# Option 2: Using the module
uv run python -m arxiv.mcp
The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout.
MCP Tools
The server provides 4 tools for paper discovery and management:
-
search_papers - Search arXiv with advanced query syntax
- Supports field prefixes (ti:, au:, abs:, cat:)
- Boolean operators (AND, OR, ANDNOT)
- Pagination and sorting options
- Returns paper metadata including title, authors, abstract, categories
-
get_paper - Get detailed information about a specific paper
- Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1)
- Optionally downloads PDF automatically
- Returns complete metadata including DOI, journal references, comments
-
download_paper - Download PDF for a specific paper
- Downloads to local
.arxivdirectory - Returns file path and size information
- Supports force re-download option
- Downloads to local
-
list_downloaded_papers - List all locally downloaded PDFs
- Shows arxiv IDs, file sizes, and paths
- Useful for managing local paper collection
MCP Resources
The server exposes 2 resources for direct access:
- paper://{arxiv_id} - Get formatted paper metadata in markdown
- downloads://list - Get markdown table of all downloaded papers
MCP Prompts
Pre-built prompt templates to guide usage:
- search_arxiv_prompt - Guide for searching arXiv papers
- download_paper_prompt - Guide for downloading and managing papers
Claude Desktop Configuration
Add to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
If installed from GitHub/pip:
{
"mcpServers": {
"arxiv": {
"command": "arxiv-mcp"
}
}
}
If running from source/development:
{
"mcpServers": {
"arxiv": {
"command": "uv",
"args": ["run", "arxiv-mcp"],
"cwd": "/path/to/arxiv_for_agents"
}
}
}
Or use --directory to avoid needing cwd:
{
"mcpServers": {
"arxiv": {
"command": "uv",
"args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"]
}
}
}
MCP Use Cases
Once configured, you can ask Claude to:
- "Search arXiv for recent papers on transformer architectures"
- "Find papers by Geoffrey Hinton in the cs.AI category"
- "Download the 'Attention is All You Need' paper"
- "Show me papers about neural networks from 2023"
- "List all the papers I've downloaded"
- "Get the abstract for arXiv:1706.03762"
The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv.
Architecture
Module Structure
arxiv/
├── __init__.py # Package exports
├── __main__.py # CLI entry point
├── cli.py # Click commands
├── models.py # Pydantic models
├── services.py # API client service
└── mcp/ # MCP server
├── __init__.py # MCP package exports
├── __main__.py # MCP server entry point
└── server.py # FastMCP server with tools, resources, prompts
tests/
└── test_services.py # Integration tests (26 tests)
Pydantic Models
All API responses are typed using Pydantic:
from arxiv import ArxivService
service = ArxivService()
result = service.search("ti:neural", max_results=5)
# result is typed as ArxivSearchResult
print(f"Total: {result.total_results}")
for entry in result.entries:
# entry is typed as ArxivEntry
print(f"{entry.arxiv_id}: {entry.title}")
print(f"Authors: {', '.join(a.name for a in entry.authors)}")
Key Models
-
ArxivSearchResult: Search results with metadata
total_results: Total matching papersentries: List of ArxivEntry objects
-
ArxivEntry: Individual paper
arxiv_id: Clean ID (e.g., "1706.03762")title,summary: Paper metadataauthors: List of Author objectscategories: Subject categoriespdf_url: Direct PDF linkpublished,updated: Datetime objects
-
Author: Paper author
name: Author nameaffiliation: Optional affiliation
Testing
Run all 26 integration tests (makes real API calls):
uv run pytest tests/test_services.py -v
Run specific test class:
uv run pytest tests/test_services.py::TestArxivServiceSearch -v
The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data.
API Rate Limiting
The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this:
from arxiv import ArxivService
service = ArxivService(rate_limit_delay=5.0) # 5 seconds
Examples
Python API
from arxiv import ArxivService
# Initialize service
service = ArxivService(download_dir="./papers")
# Search
results = service.search(
query="ti:attention is all you need",
max_results=5,
sort_by="relevance"
)
print(f"Found {results.total_results} papers")
for entry in results.entries:
print(f"- {entry.title}")
# Get specific paper
entry = service.get("1706.03762", download_pdf=True)
print(f"Downloaded: {entry.title}")
# Just download PDF
pdf_path = service.download_pdf("1706.03762")
print(f"PDF saved to: {pdf_path}")
CLI Examples
# Find recent papers in a category
arxiv search "cat:cs.AI" \
--max-results 10 \
--sort-by submittedDate \
--sort-order descending
# Search and output as JSON for processing
arxiv search "ti:transformer" --json | jq '.entries[].title'
# Batch download multiple papers
for id in 1706.03762 1810.04805 2010.11929; do
arxiv download $id
done
Development
The codebase follows these principles:
- Type safety: Pydantic models for all API responses
- Clean architecture: Separation of CLI, service, and models
- Real tests: Integration tests with actual API calls (no mocks)
- Rate limiting: Respects arXiv API guidelines
- Caching: Automatic local caching to avoid re-downloads
arXiv API Reference
- Base URL: https://export.arxiv.org/api/query
- Format: Atom XML
- Rate limit: 3 seconds between requests (recommended)
- Documentation: https://info.arxiv.org/help/api/user-manual.html
License
This is a personal project for interacting with arXiv's public API.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。