Introduction
Quick Summary
Confident AI was designed for LLM teams to quality assure LLM applications from development to production. It is an all-in-one platform that unlocks deepeval
's full potential by allowing you to:
- Evaluate LLM applications on Confident AI's infrastructure with proprietary models
- Keep track of the evaluation history of your LLM application
- Centralize and standardize evaluation datasets on the cloud
- Trace and debug LLM applications during evaluation
- Online evaluation of LLM applications in production
- Generate evaluation-based summary reports for relevant stakeholders
Why Confident AI?
As you try to evaluate and monitor LLM applications in both development and production environments, you might face several challenges:
- Evaluation and Testing Quality: Running evaluations locally on
deepeval
is great but often times you will find flaky metric scores when using your own model of choice for evaluation. By running evaluations on Confident AI's infrastructure you get the latest metrics implementation with the best evaluation models available for each particular metric. - Dataset Quality Assurance: Keeping track of which test cases are ready for evaluation can become cumbersome, and miscommunication between expert data annotators and engineers regarding test case specifics can lead to inefficiencies.
- Experimentation Difficulties: Finding an easy way to experiment with the best LLM system implementations is essential but often challenging and unintuitive.
- Identifying Issues at Scale: Spotting unsatisfactory responses in production at scale can be daunting, especially for a complex LLM system architecture.
Here's a diagram outlining how Confident AI works:
Login to Confident AI
Everything in deepeval
is already automatically integrated with Confident AI, including deepeval
's custom metrics. To start using Confident AI with deepeval
, simply login in the CLI:
deepeval login
Follow the instructions displayed on the CLI (to create an account, get your Confident API key, paste it in the CLI), and you're good to go.
You can also login directly in Python if you already have a Confident API Key:
deepeval.login_with_confident_api_key("your-confident-api-key")
Or, via the CLI:
deepeval login --confident-api-key "your-confident-api-key"