Opik
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards. - comet-ml/opik
README
<h1 align="center" style="border-bottom: none"> <div> <a href="https://www.comet.com/site/products/opik/?from=llm&utm_source=opik&utm_medium=github&utm_content=header_img&utm_campaign=opik"><picture> <source media="(prefers-color-scheme: dark)" srcset="/apps/opik-documentation/documentation/static/img/logo-dark-mode.svg"> <source media="(prefers-color-scheme: light)" srcset="/apps/opik-documentation/documentation/static/img/opik-logo.svg"> <img alt="Comet Opik logo" src="/apps/opik-documentation/documentation/static/img/opik-logo.svg" width="200" /> </picture></a> <br> Opik </div> Open source LLM evaluation framework<br> </h1>
<p align="center"> From RAG chatbots to code assistants to complex agentic pipelines and beyond, build LLM systems that run better, faster, and cheaper with tracing, evaluations, and dashboards. </p>
<div align="center">
<a target="_blank" href="https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/opik_quickstart.ipynb">
<!-- <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quickstart In Colab"/> --> </a>
</div>
<p align="center"> <a href="https://www.comet.com/site/products/opik/?from=llm&utm_source=opik&utm_medium=github&utm_content=website_button&utm_campaign=opik"><b>Website</b></a> • <a href="https://chat.comet.com"><b>Slack community</b></a> • <a href="https://x.com/Cometml"><b>Twitter</b></a> • <a href="https://www.comet.com/docs/opik/?from=llm&utm_source=opik&utm_medium=github&utm_content=docs_button&utm_campaign=opik"><b>Documentation</b></a> </p>
🚀 What is Opik?
Opik is an open-source platform for evaluating, testing and monitoring LLM applications. Built by Comet.
<br>
You can use Opik for:
-
Development:
-
Tracing: Track all LLM calls and traces during development and production (Quickstart, Integrations
-
Annotations: Annotate your LLM calls by logging feedback scores using the Python SDK or the UI.
-
Playground:: Try out different prompts and models in the prompt playground
-
-
Evaluation: Automate the evaluation process of your LLM application:
-
Datasets and Experiments: Store test cases and run experiments (Datasets, Evaluate your LLM Application)
-
LLM as a judge metrics: Use Opik's LLM as a judge metric for complex issues like hallucination detection, moderation and RAG evaluation (Answer Relevance, Context Precision
-
CI/CD integration: Run evaluations as part of your CI/CD pipeline using our PyTest integration
-
-
Production Monitoring:
-
Log all your production traces: Opik has been designed to support high volumes of traces, making it easy to monitor your production applications. Even small deployments can ingest more than 40 million traces per day!
-
Monitoring dashboards: Review your feedback scores, trace count and tokens over time in the Opik Dashboard.
-
Online evaluation metrics: Easily score all your production traces using LLM as a Judge metrics and identify any issues with your production LLM application thanks to Opik's online evaluation metrics
-
[!TIP]
If you are looking for features that Opik doesn't have today, please raise a new Feature request 🚀
<br>
🛠️ Installation
Opik is available as a fully open source local installation or using Comet.com as a hosted solution. The easiest way to get started with Opik is by creating a free Comet account at comet.com.
If you'd like to self-host Opik, you can do so by cloning the repository and starting the platform using Docker Compose:
# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git
# Navigate to the opik/deployment/docker-compose directory
cd opik/deployment/docker-compose
# Optionally, you can force a pull of the latest images
docker compose pull
# Start the Opik platform
docker compose up --detach
# You can now visit http://localhost:5173 on your browser!
For more information about the different deployment options, please see our deployment guides:
Installation methods | Docs link |
---|---|
Local instance | |
Kubernetes |
🏁 Get Started
To get started, you will need to first install the Python SDK:
pip install opik
Once the SDK is installed, you can configure it by running the opik configure
command:
opik configure
This will allow you to configure Opik locally by setting the correct local server address or if you're using the Cloud platform by setting the API Key
[!TIP]
You can also call theopik.configure(use_local=True)
method from your Python code to configure the SDK to run on the local installation.
You are now ready to start logging traces using the Python SDK.
📝 Logging Traces
The easiest way to get started is to use one of our integrations. Opik supports:
Integration | Description | Documentation | Try in Colab |
---|---|---|---|
OpenAI | Log traces for all OpenAI LLM calls | Documentation | |
LiteLLM | Call any LLM model using the OpenAI format | Documentation | |
LangChain | Log traces for all LangChain LLM calls | Documentation | |
Haystack | Log traces for all Haystack calls | Documentation | |
Anthropic | Log traces for all Anthropic LLM calls | Documentation | |
Bedrock | Log traces for all Bedrock LLM calls | Documentation | |
CrewAI | Log traces for all CrewAI calls | Documentation | |
DeepSeek | Log traces for all DeepSeek LLM calls | Documentation | |
DSPy | Log traces for all DSPy runs | Documentation | |
Gemini | Log traces for all Gemini LLM calls | Documentation | |
Groq | Log traces for all Groq LLM calls | Documentation | |
Guardrails | Log traces for all Guardrails validations | Documentation | |
Instructor | Log traces for all LLM calls made with Instructor | Documentation | |
LangGraph | Log traces for all LangGraph executions | Documentation | |
LlamaIndex | Log traces for all LlamaIndex LLM calls | Documentation | |
Ollama | Log traces for all Ollama LLM calls | Documentation | |
Predibase | Fine-tune and serve open-source Large Language Models | Documentation | |
Ragas | Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines | Documentation | |
watsonx | Log traces for all watsonx LLM calls | Documentation |
[!TIP]
If the framework you are using is not listed above, feel free to open an issue or submit a PR with the integration.
If you are not using any of the frameworks above, you can also use the track
function decorator to log traces:
import opik
opik.configure(use_local=True) # Run locally
@opik.track
def my_llm_function(user_question: str) -> str:
# Your LLM code here
return "Hello"
[!TIP]
The track decorator can be used in conjunction with any of our integrations and can also be used to track nested function calls.
🧑⚖️ LLM as a Judge metrics
The Python Opik SDK includes a number of LLM as a judge metrics to help you evaluate your LLM application. Learn more about it in the metrics documentation.
To use them, simply import the relevant metric and use the score
function:
from opik.evaluation.metrics import Hallucination
metric = Hallucination()
score = metric.score(
input="What is the capital of France?",
output="Paris",
context=["France is a country in Europe."]
)
print(score)
Opik also includes a number of pre-built heuristic metrics as well as the ability to create your own. Learn more about it in the metrics documentation.
🔍 Evaluating your LLM Application
Opik allows you to evaluate your LLM application during development through Datasets and Experiments.
You can also run evaluations as part of your CI/CD pipeline using our PyTest integration.
⭐ Star Us on GitHub
If you find Opik useful, please consider giving us a star! Your support helps us grow our community and continue improving the product.
<img src="https://github.com/user-attachments/assets/ffc208bb-3dc0-40d8-9a20-8513b5e4a59d" alt="Opik GitHub Star History" width="600"/>
🤝 Contributing
There are many ways to contribute to Opik:
- Submit bug reports and feature requests
- Review the documentation and submit Pull Requests to improve it
- Speaking or writing about Opik and letting us know
- Upvoting popular feature requests to show your support
To learn more about how to contribute to Opik, please see our contributing guidelines.
推荐服务器

VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
AIO-MCP Server
🚀 集成了 AI 搜索、RAG 和多服务(GitLab/Jira/Confluence/YouTube)的一体化 MCP 服务器,旨在增强 AI 驱动的开发工作流程。来自 Folk。
Hyperbrowser
欢迎来到 Hyperbrowser,人工智能的互联网。Hyperbrowser 是下一代平台,旨在增强人工智能代理的能力,并实现轻松、可扩展的浏览器自动化。它专为人工智能开发者打造,消除了本地基础设施和性能瓶颈带来的麻烦,让您能够:
BigQuery MCP Server
这是一个服务器,可以让你的大型语言模型(LLM,比如Claude)直接与你的BigQuery数据对话!可以把它想象成一个友好的翻译器,它位于你的AI助手和数据库之间,确保它们可以安全高效地进行交流。
mcp-perplexity
Perplexity API 的 MCP 服务器。
MCP Web Research Server
一个模型上下文协议服务器,使 Claude 能够通过集成 Google 搜索、提取网页内容和捕获屏幕截图来进行网络研究。
MySQL MCP Server
允许人工智能助手通过受控界面列出表格、读取数据和执行 SQL 查询,从而使数据库探索和分析更安全、更有条理。
mcp-codex-keeper
作为开发知识的守护者,为 AI 助手提供精心策划的最新文档和最佳实践访问权限。
MCP Etherscan Server
通过 Etherscan 的 API 促进与以太坊区块链数据的交互,提供对余额、交易、代币转移、合约 ABI、gas 价格和 ENS 名称解析的实时访问。
Perplexity Deep Research MCP
一个服务器,它允许 AI 助手使用 Perplexity 的 sonar-deep-research 模型进行网络搜索,并提供引用支持。