Opik

Opik

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards. - comet-ml/opik

数据与应用分析
访问服务器

README

<h1 align="center" style="border-bottom: none"> <div> <a href="https://www.comet.com/site/products/opik/?from=llm&utm_source=opik&utm_medium=github&utm_content=header_img&utm_campaign=opik"><picture> <source media="(prefers-color-scheme: dark)" srcset="/apps/opik-documentation/documentation/static/img/logo-dark-mode.svg"> <source media="(prefers-color-scheme: light)" srcset="/apps/opik-documentation/documentation/static/img/opik-logo.svg"> <img alt="Comet Opik logo" src="/apps/opik-documentation/documentation/static/img/opik-logo.svg" width="200" /> </picture></a> <br> Opik </div> Open source LLM evaluation framework<br> </h1>

<p align="center"> From RAG chatbots to code assistants to complex agentic pipelines and beyond, build LLM systems that run better, faster, and cheaper with tracing, evaluations, and dashboards. </p>

<div align="center">

Python SDK License Build <a target="_blank" href="https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/opik_quickstart.ipynb">

<!-- <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quickstart In Colab"/> --> </a>

</div>

<p align="center"> <a href="https://www.comet.com/site/products/opik/?from=llm&utm_source=opik&utm_medium=github&utm_content=website_button&utm_campaign=opik"><b>Website</b></a> • <a href="https://chat.comet.com"><b>Slack community</b></a> • <a href="https://x.com/Cometml"><b>Twitter</b></a> • <a href="https://www.comet.com/docs/opik/?from=llm&utm_source=opik&utm_medium=github&utm_content=docs_button&utm_campaign=opik"><b>Documentation</b></a> </p>

Opik thumbnail

🚀 What is Opik?

Opik is an open-source platform for evaluating, testing and monitoring LLM applications. Built by Comet.

<br>

You can use Opik for:

  • Development:

    • Tracing: Track all LLM calls and traces during development and production (Quickstart, Integrations

    • Annotations: Annotate your LLM calls by logging feedback scores using the Python SDK or the UI.

    • Playground:: Try out different prompts and models in the prompt playground

  • Evaluation: Automate the evaluation process of your LLM application:

  • Production Monitoring:

    • Log all your production traces: Opik has been designed to support high volumes of traces, making it easy to monitor your production applications. Even small deployments can ingest more than 40 million traces per day!

    • Monitoring dashboards: Review your feedback scores, trace count and tokens over time in the Opik Dashboard.

    • Online evaluation metrics: Easily score all your production traces using LLM as a Judge metrics and identify any issues with your production LLM application thanks to Opik's online evaluation metrics

[!TIP]
If you are looking for features that Opik doesn't have today, please raise a new Feature request 🚀

<br>

🛠️ Installation

Opik is available as a fully open source local installation or using Comet.com as a hosted solution. The easiest way to get started with Opik is by creating a free Comet account at comet.com.

If you'd like to self-host Opik, you can do so by cloning the repository and starting the platform using Docker Compose:

# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git

# Navigate to the opik/deployment/docker-compose directory
cd opik/deployment/docker-compose

# Optionally, you can force a pull of the latest images
docker compose pull

# Start the Opik platform
docker compose up --detach

# You can now visit http://localhost:5173 on your browser!

For more information about the different deployment options, please see our deployment guides:

Installation methods Docs link
Local instance Local Deployment
Kubernetes Kubernetes

🏁 Get Started

To get started, you will need to first install the Python SDK:

pip install opik

Once the SDK is installed, you can configure it by running the opik configure command:

opik configure

This will allow you to configure Opik locally by setting the correct local server address or if you're using the Cloud platform by setting the API Key

[!TIP]
You can also call the opik.configure(use_local=True) method from your Python code to configure the SDK to run on the local installation.

You are now ready to start logging traces using the Python SDK.

📝 Logging Traces

The easiest way to get started is to use one of our integrations. Opik supports:

Integration Description Documentation Try in Colab
OpenAI Log traces for all OpenAI LLM calls Documentation Open Quickstart In Colab
LiteLLM Call any LLM model using the OpenAI format Documentation Open Quickstart In Colab
LangChain Log traces for all LangChain LLM calls Documentation Open Quickstart In Colab
Haystack Log traces for all Haystack calls Documentation Open Quickstart In Colab
Anthropic Log traces for all Anthropic LLM calls Documentation Open Quickstart In Colab
Bedrock Log traces for all Bedrock LLM calls Documentation Open Quickstart In Colab
CrewAI Log traces for all CrewAI calls Documentation Open Quickstart In Colab
DeepSeek Log traces for all DeepSeek LLM calls Documentation
DSPy Log traces for all DSPy runs Documentation Open Quickstart In Colab
Gemini Log traces for all Gemini LLM calls Documentation Open Quickstart In Colab
Groq Log traces for all Groq LLM calls Documentation Open Quickstart In Colab
Guardrails Log traces for all Guardrails validations Documentation Open Quickstart In Colab
Instructor Log traces for all LLM calls made with Instructor Documentation Open Quickstart In Colab
LangGraph Log traces for all LangGraph executions Documentation Open Quickstart In Colab
LlamaIndex Log traces for all LlamaIndex LLM calls Documentation Open Quickstart In Colab
Ollama Log traces for all Ollama LLM calls Documentation Open Quickstart In Colab
Predibase Fine-tune and serve open-source Large Language Models Documentation Open Quickstart In Colab
Ragas Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines Documentation Open Quickstart In Colab
watsonx Log traces for all watsonx LLM calls Documentation Open Quickstart In Colab

[!TIP]
If the framework you are using is not listed above, feel free to open an issue or submit a PR with the integration.

If you are not using any of the frameworks above, you can also use the track function decorator to log traces:

import opik

opik.configure(use_local=True) # Run locally

@opik.track
def my_llm_function(user_question: str) -> str:
    # Your LLM code here

    return "Hello"

[!TIP]
The track decorator can be used in conjunction with any of our integrations and can also be used to track nested function calls.

🧑‍⚖️ LLM as a Judge metrics

The Python Opik SDK includes a number of LLM as a judge metrics to help you evaluate your LLM application. Learn more about it in the metrics documentation.

To use them, simply import the relevant metric and use the score function:

from opik.evaluation.metrics import Hallucination

metric = Hallucination()
score = metric.score(
    input="What is the capital of France?",
    output="Paris",
    context=["France is a country in Europe."]
)
print(score)

Opik also includes a number of pre-built heuristic metrics as well as the ability to create your own. Learn more about it in the metrics documentation.

🔍 Evaluating your LLM Application

Opik allows you to evaluate your LLM application during development through Datasets and Experiments.

You can also run evaluations as part of your CI/CD pipeline using our PyTest integration.

⭐ Star Us on GitHub

If you find Opik useful, please consider giving us a star! Your support helps us grow our community and continue improving the product.

<img src="https://github.com/user-attachments/assets/ffc208bb-3dc0-40d8-9a20-8513b5e4a59d" alt="Opik GitHub Star History" width="600"/>

🤝 Contributing

There are many ways to contribute to Opik:

To learn more about how to contribute to Opik, please see our contributing guidelines.

推荐服务器

VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
AIO-MCP Server

AIO-MCP Server

🚀 集成了 AI 搜索、RAG 和多服务(GitLab/Jira/Confluence/YouTube)的一体化 MCP 服务器,旨在增强 AI 驱动的开发工作流程。来自 Folk。

精选
本地
Hyperbrowser

Hyperbrowser

欢迎来到 Hyperbrowser,人工智能的互联网。Hyperbrowser 是下一代平台,旨在增强人工智能代理的能力,并实现轻松、可扩展的浏览器自动化。它专为人工智能开发者打造,消除了本地基础设施和性能瓶颈带来的麻烦,让您能够:

精选
本地
BigQuery MCP Server

BigQuery MCP Server

这是一个服务器,可以让你的大型语言模型(LLM,比如Claude)直接与你的BigQuery数据对话!可以把它想象成一个友好的翻译器,它位于你的AI助手和数据库之间,确保它们可以安全高效地进行交流。

精选
mcp-perplexity

mcp-perplexity

Perplexity API 的 MCP 服务器。

精选
MCP Web Research Server

MCP Web Research Server

一个模型上下文协议服务器,使 Claude 能够通过集成 Google 搜索、提取网页内容和捕获屏幕截图来进行网络研究。

精选
MySQL MCP Server

MySQL MCP Server

允许人工智能助手通过受控界面列出表格、读取数据和执行 SQL 查询,从而使数据库探索和分析更安全、更有条理。

精选
mcp-codex-keeper

mcp-codex-keeper

作为开发知识的守护者,为 AI 助手提供精心策划的最新文档和最佳实践访问权限。

精选
MCP Etherscan Server

MCP Etherscan Server

通过 Etherscan 的 API 促进与以太坊区块链数据的交互,提供对余额、交易、代币转移、合约 ABI、gas 价格和 ENS 名称解析的实时访问。

精选
Perplexity Deep Research MCP

Perplexity Deep Research MCP

一个服务器,它允许 AI 助手使用 Perplexity 的 sonar-deep-research 模型进行网络搜索,并提供引用支持。

精选