arXiv MCP Server

arXiv MCP Server

Enables searching, downloading, and managing academic papers from arXiv.org through natural language interactions. Provides tools for paper discovery, PDF downloads, and local paper collection management.

Category
访问服务器

README

arXiv CLI & MCP Server

A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration.

CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options.

Features

  • Search arXiv papers by title, author, abstract, category, and more
  • Download PDFs automatically with local caching
  • MCP Server for integration with LLM assistants (Claude Desktop, etc.)
  • Typed responses using Pydantic models for clean data handling
  • Rate limiting built-in to respect arXiv API guidelines
  • Comprehensive tests with 26 integration tests (no mocking)

Installation

Option 1: Install from GitHub (Recommended)

Install directly from the GitHub repository:

# Install the latest version
uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git

# Or with pip
pip install git+https://github.com/LiamConnell/arxiv_for_agents.git

# Now you can use the arxiv command
arxiv --help

Option 2: Install from Source

Clone the repository and install locally:

# Clone the repository
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents

# Install in editable mode
uv pip install -e .

# Now you can use the arxiv command
arxiv --help

Option 3: Development Installation

For development with all dependencies:

# Clone and install with dev dependencies
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
uv pip install -e ".[dev]"

# Run tests
uv run pytest

Verify Installation

# If installed as package
arxiv --help

# Or if using as module
uv run python -m arxiv --help

Usage

Note: If you installed as a package, use arxiv directly. Otherwise, use uv run python -m arxiv.

Search Papers

Search by title:

# Using installed package
arxiv search "ti:attention is all you need"

# Or using as module
uv run python -m arxiv search "ti:attention is all you need"

Search by author:

arxiv search "au:Hinton" --max-results 20

Search by category:

arxiv search "cat:cs.AI" --max-results 10

Combined search:

arxiv search "ti:transformer AND au:Vaswani"

Get Specific Paper

Get paper metadata and download PDF:

arxiv get 1706.03762

Get metadata only (no download):

arxiv get 1706.03762 --no-download

Force re-download:

arxiv get 1706.03762 --force

Download PDF

Download just the PDF:

arxiv download 1706.03762

List Downloaded PDFs

arxiv list-downloads

JSON Output

Get results as JSON for scripting:

arxiv search "ti:neural" --json
arxiv get 1706.03762 --json --no-download

Search Query Syntax

The arXiv API supports field-specific searches:

  • ti: - Title
  • au: - Author
  • abs: - Abstract
  • cat: - Category (e.g., cs.AI, cs.LG)
  • all: - All fields (default)

You can combine searches with AND, OR, and ANDNOT:

arxiv search "ti:neural AND cat:cs.LG"
arxiv search "au:Hinton OR au:Bengio"

Download Directory

PDFs are downloaded to ./.arxiv by default. Change this with:

arxiv --download-dir ./papers search "ti:transformer"

MCP Server (Model Context Protocol)

The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically.

Running the MCP Server

# Option 1: Using the script entry point (recommended)
uv run arxiv-mcp

# Option 2: Using the module
uv run python -m arxiv.mcp

The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout.

MCP Tools

The server provides 4 tools for paper discovery and management:

  1. search_papers - Search arXiv with advanced query syntax

    • Supports field prefixes (ti:, au:, abs:, cat:)
    • Boolean operators (AND, OR, ANDNOT)
    • Pagination and sorting options
    • Returns paper metadata including title, authors, abstract, categories
  2. get_paper - Get detailed information about a specific paper

    • Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1)
    • Optionally downloads PDF automatically
    • Returns complete metadata including DOI, journal references, comments
  3. download_paper - Download PDF for a specific paper

    • Downloads to local .arxiv directory
    • Returns file path and size information
    • Supports force re-download option
  4. list_downloaded_papers - List all locally downloaded PDFs

    • Shows arxiv IDs, file sizes, and paths
    • Useful for managing local paper collection

MCP Resources

The server exposes 2 resources for direct access:

  • paper://{arxiv_id} - Get formatted paper metadata in markdown
  • downloads://list - Get markdown table of all downloaded papers

MCP Prompts

Pre-built prompt templates to guide usage:

  • search_arxiv_prompt - Guide for searching arXiv papers
  • download_paper_prompt - Guide for downloading and managing papers

Claude Desktop Configuration

Add to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

If installed from GitHub/pip:

{
  "mcpServers": {
    "arxiv": {
      "command": "arxiv-mcp"
    }
  }
}

If running from source/development:

{
  "mcpServers": {
    "arxiv": {
      "command": "uv",
      "args": ["run", "arxiv-mcp"],
      "cwd": "/path/to/arxiv_for_agents"
    }
  }
}

Or use --directory to avoid needing cwd:

{
  "mcpServers": {
    "arxiv": {
      "command": "uv",
      "args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"]
    }
  }
}

MCP Use Cases

Once configured, you can ask Claude to:

  • "Search arXiv for recent papers on transformer architectures"
  • "Find papers by Geoffrey Hinton in the cs.AI category"
  • "Download the 'Attention is All You Need' paper"
  • "Show me papers about neural networks from 2023"
  • "List all the papers I've downloaded"
  • "Get the abstract for arXiv:1706.03762"

The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv.

Architecture

Module Structure

arxiv/
├── __init__.py       # Package exports
├── __main__.py       # CLI entry point
├── cli.py            # Click commands
├── models.py         # Pydantic models
├── services.py       # API client service
└── mcp/              # MCP server
    ├── __init__.py   # MCP package exports
    ├── __main__.py   # MCP server entry point
    └── server.py     # FastMCP server with tools, resources, prompts

tests/
└── test_services.py  # Integration tests (26 tests)

Pydantic Models

All API responses are typed using Pydantic:

from arxiv import ArxivService

service = ArxivService()
result = service.search("ti:neural", max_results=5)

# result is typed as ArxivSearchResult
print(f"Total: {result.total_results}")

for entry in result.entries:
    # entry is typed as ArxivEntry
    print(f"{entry.arxiv_id}: {entry.title}")
    print(f"Authors: {', '.join(a.name for a in entry.authors)}")

Key Models

  • ArxivSearchResult: Search results with metadata

    • total_results: Total matching papers
    • entries: List of ArxivEntry objects
  • ArxivEntry: Individual paper

    • arxiv_id: Clean ID (e.g., "1706.03762")
    • title, summary: Paper metadata
    • authors: List of Author objects
    • categories: Subject categories
    • pdf_url: Direct PDF link
    • published, updated: Datetime objects
  • Author: Paper author

    • name: Author name
    • affiliation: Optional affiliation

Testing

Run all 26 integration tests (makes real API calls):

uv run pytest tests/test_services.py -v

Run specific test class:

uv run pytest tests/test_services.py::TestArxivServiceSearch -v

The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data.

API Rate Limiting

The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this:

from arxiv import ArxivService

service = ArxivService(rate_limit_delay=5.0)  # 5 seconds

Examples

Python API

from arxiv import ArxivService

# Initialize service
service = ArxivService(download_dir="./papers")

# Search
results = service.search(
    query="ti:attention is all you need",
    max_results=5,
    sort_by="relevance"
)

print(f"Found {results.total_results} papers")
for entry in results.entries:
    print(f"- {entry.title}")

# Get specific paper
entry = service.get("1706.03762", download_pdf=True)
print(f"Downloaded: {entry.title}")

# Just download PDF
pdf_path = service.download_pdf("1706.03762")
print(f"PDF saved to: {pdf_path}")

CLI Examples

# Find recent papers in a category
arxiv search "cat:cs.AI" \
  --max-results 10 \
  --sort-by submittedDate \
  --sort-order descending

# Search and output as JSON for processing
arxiv search "ti:transformer" --json | jq '.entries[].title'

# Batch download multiple papers
for id in 1706.03762 1810.04805 2010.11929; do
  arxiv download $id
done

Development

The codebase follows these principles:

  1. Type safety: Pydantic models for all API responses
  2. Clean architecture: Separation of CLI, service, and models
  3. Real tests: Integration tests with actual API calls (no mocks)
  4. Rate limiting: Respects arXiv API guidelines
  5. Caching: Automatic local caching to avoid re-downloads

arXiv API Reference

  • Base URL: https://export.arxiv.org/api/query
  • Format: Atom XML
  • Rate limit: 3 seconds between requests (recommended)
  • Documentation: https://info.arxiv.org/help/api/user-manual.html

License

This is a personal project for interacting with arXiv's public API.

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选