webfetch
License-first federated image search for AI agents and humans. Exposes MCP tools for concise, attribution-aware image discovery, license probing, and guarded downloads across open, platform, and editorial sources.
README
webfetch
The license-first image layer for AI agents and humans.
One MCP server, one CLI, and one HTTP server that federate across 25 image
providers, rank results license-first, and reject UNKNOWN results by default.
Any agent that speaks MCP (Claude Code, Cursor, Cline,
Continue, Roo Code, Codex) wires up from one config line. Landing page,
pricing, and hosted usage live at getwebfetch.com.
Install
| Surface | One-liner |
|---|---|
| npm | npm i -g getwebfetch |
| Homebrew | brew tap ashlrai/webfetch && brew install webfetch |
| Docker | docker run --rm ghcr.io/ashlrai/webfetch cli help |
| curl | bash | curl -fsSL https://raw.githubusercontent.com/ashlrai/webfetch/main/install/install.sh | bash |
The curl | bash installer also wires webfetch into Claude Code's
~/.claude/settings.json idempotently. Re-run any time to update.
Surfaces
| Surface | Best for | Entry point |
|---|---|---|
| CLI | scripts, shell work, agent handoff | webfetch search ... |
| MCP server | Claude Code, Cursor, Cline, Continue, Roo Code, Codex | npx -y getwebfetch-mcp |
| HTTP server | local integrations and extensions | npx -y webfetch-server |
| Core library | TypeScript apps and custom tooling | npm i webfetch-core |
| Browser layer | fallback extraction and managed-browser flows | npm i webfetch-browser |
| Hosted cloud | pooled keys, usage tracking, team controls | app.getwebfetch.com |
Package-level API notes live in packages/core/README.md,
packages/browser/README.md, and the other
package READMEs under packages/.
30-second usage
CLI:
webfetch search "drake portrait" --limit 5
webfetch artist "Taylor Swift" --kind portrait --min-width 1200
webfetch download <url> --out ./portrait.jpg
printf "drake portrait\nradiohead album\n" | webfetch batch --jsonl --continue-on-error
MCP (from inside any MCP-speaking agent):
search_images({ query: "drake portrait", limit: 5 })
search_artist_images({ artist: "Taylor Swift", kind: "portrait" })
download_image({ url: "..." })
TypeScript library:
import { searchArtistImages, pickBest, downloadImage } from "webfetch-core";
const { candidates } = await searchArtistImages("Drake", "portrait");
const best = pickBest(candidates, { minWidth: 1200 });
if (best) {
const { cachedPath, sha256 } = await downloadImage(best.url);
console.log(best.attributionLine, "->", cachedPath);
}
What problem this solves
Manually sourcing an image has four failure modes:
- You don't know the license, so you can't safely ship the result.
- You can't script it — every new site means another afternoon.
- Google's Image Search API is retired; scraping is brittle and ToS-grey.
- No shared cache — you re-download the same file dozens of times.
webfetch fixes all four by federating across direct-source APIs that have stable terms and structured license metadata, ranking candidates license-first, and exposing the result as a single MCP tool.
Providers
| Provider | Covers | License default | Auth | Opt-in |
|---|---|---|---|---|
| wikimedia | portraits, events, logos, history | CC_BY_SA (metadata) | — | no |
| openverse | any CC-licensed content | CC_BY (metadata) | — | no |
| unsplash | high-quality photography | UNSPLASH_LICENSE |
UNSPLASH_ACCESS_KEY |
no |
| pexels | stock photography | PEXELS_LICENSE |
PEXELS_API_KEY |
no |
| pixabay | stock photos + illustrations | PIXABAY_LICENSE |
PIXABAY_API_KEY |
no |
| itunes | album covers, artist portraits | EDITORIAL_LICENSED | — | no |
| musicbrainz-caa | canonical album art | EDITORIAL_LICENSED | — | no |
| spotify | artist + album images | EDITORIAL_LICENSED | SPOTIFY_CLIENT_ID/SECRET |
no |
| youtube-thumb | video thumbnails | EDITORIAL_LICENSED | — | yes |
| brave | general web image search | UNKNOWN (+heuristic) | BRAVE_API_KEY |
no |
| bing | general web image search | UNKNOWN (+heuristic) | BING_API_KEY |
yes |
| serpapi | Google Images + reverse lookup | UNKNOWN (+heuristic) | SERPAPI_KEY |
yes |
| browser | headless fallback vs images.google.com | UNKNOWN | — | yes |
| managed-browser | Bright Data managed browser fallback | UNKNOWN | BRIGHTDATA_API_TOKEN |
yes |
| flickr | CC / public-domain photography | CC_BY (metadata) | FLICKR_API_KEY |
no |
| internet-archive | public-domain / CC archive media | PUBLIC_DOMAIN | — | no |
| smithsonian | Open Access museum media | CC0 | SMITHSONIAN_API_KEY |
no |
| nasa | NASA imagery | PUBLIC_DOMAIN | — | no |
| met-museum | The Met Open Access | CC0 | — | no |
| europeana | European cultural heritage | CC_BY (metadata) | EUROPEANA_API_KEY |
no |
| library-of-congress | US historical archive | PUBLIC_DOMAIN | — | no |
| wellcome-collection | medical/historical imagery | CC_BY (metadata) | — | no |
| rawpixel | CC0 stock slice | CC0 | RAWPIXEL_API_KEY optional |
no |
| burst | Shopify Burst stock photos | CC0 | — | no |
| europeana-archival | Europeana text/manuscript records | CC_BY (metadata) | EUROPEANA_API_KEY |
yes |
See docs/PROVIDERS.md for gotchas, rate limits, and
docs/PROVIDER_TUNING.md for per-use-case picks.
Local and cloud modes
The CLI is local-first: by default webfetch search, artist, album,
download, probe, license, and batch call webfetch-core in-process
and use provider API keys from your environment. Pass --cloud or set
WEBFETCH_MODE=cloud to call https://api.getwebfetch.com/v1/* with
WEBFETCH_API_KEY or webfetch config set apiKey wf_live_....
Use local mode when you want direct provider calls and a local cache. Use cloud mode when you want hosted auth, pooled provider keys, managed browser fallback, usage accounting, or team controls.
Why license-first
The only outcome we reject by default is an image we can't justify. A marginally-better photo under an unknown license is worthless to a pipeline that needs to ship without human review. Relevance ties are easy to break; provenance is not.
The ranker sorts by: license tag -> metadata confidence -> resolution ->
provider priority. UNKNOWN is rejected by default (Berne Convention:
most of the web is all-rights-reserved unless proven otherwise). See
docs/LICENSE_POLICY.md.
Migration: CC0 stock providers
Older webfetch builds treated Unsplash, Pexels, and Pixabay as CC0. Current
builds expose their platform terms explicitly:
| Old tag | New tag | What to check |
|---|---|---|
CC0 from Unsplash |
UNSPLASH_LICENSE |
Unsplash terms; not Creative Commons |
CC0 from Pexels |
PEXELS_LICENSE |
Pexels terms; not Creative Commons |
CC0 from Pixabay |
PIXABAY_LICENSE |
Pixabay terms; not Creative Commons |
Most callers should keep licensePolicy: "safe-only" because it still allows
open, platform, editorial, and press-kit categories while rejecting UNKNOWN.
Pipelines that require only Creative Commons or public-domain assets should use
licensePolicy: "open-only" and update type guards to handle the three
platform tags separately.
webfetch vs alternatives
| Capability | webfetch | Raw Google Images | Unsplash-only | Bing CSE |
|---|---|---|---|---|
| Scriptable via API | yes | no (retired) | yes | yes |
| License metadata per result | yes | no | yes (one lic) | partial |
| Covers editorial music art | yes | partial | no | partial |
| Covers CC / public-domain | yes | no | no | no |
| Safe-by-default (rejects UNKNOWN) | yes | n/a | n/a | no |
| Shared content-addressed cache | yes | no | no | no |
| Attribution line pre-built | yes | no | no | no |
| One MCP config line across all IDEs | yes | no | no | no |
| No per-query cost on defaults | yes | n/a | yes | no |
Architecture
+------------------+
| webfetch-core |
| (ranker, cache, |
| license coerce)|
+---------+--------+
|
+----------------+-----------+-----------+----------------+
| | | |
+-------v------+ +------v-------+ +-------v------+ +------v-------+
| webfetch | | webfetch-mcp | | webfetch- | | browser |
| CLI | | (stdio) | | server (HTTP)| | extensions |
+-------+------+ +------+-------+ +-------+------+ +------+-------+
| | | |
| | | |
+----------------+-----------+-----------+----------------+
|
+---------------------v---------------------+
| provider adapters |
| wikimedia openverse unsplash pexels |
| pixabay itunes mb-caa spotify |
| youtube brave bing serpapi |
| flickr nasa met europeana |
| loc wellcome rawpixel burst |
| browser + managed-browser + archival opt-in|
+-------------------------------------------+
Every surface shares ~/.webfetch/cache/ keyed by SHA-256, so a download
from the CLI is instantly available to the MCP server and vice versa.
Safety defaults
licensePolicy: "safe-only"— open, platform-license, and editorial/press categories are allowed;UNKNOWNis rejected.safeSearch: "strict".- Opt-in providers (
youtube-thumb,bing,serpapi,browser,managed-browser,europeana-archival) off by default. - 20 MB per-download cap, content-type guard, host blocklist.
robots.txtrespected on generic page probes.
Roadmap
webfetch watch— daemon mode for repeated queries / incremental refresh.- Bring-your-own-provider plugin API.
- Hosted tier at getwebfetch.com — pooled provider keys, managed browser fallback, team usage dashboard.
Contributing
Issues and PRs welcome. Run bun install && bun test to get started. See
docs/ for per-area reference docs.
License
MIT.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。