Confident AI Quickstart

Introduction

Confident AI was designed for LLM teams to quality assure LLM applications from development to production. It is an all-in-one platform that unlocks deepeval's full potential by allowing you to:

Evaluate LLM applications on Confident AI's infrastructure with proprietary models
Keep track of the evaluation history of your LLM application
Centralize and standardize evaluation datasets on the cloud
Trace and debug LLM applications during evaluation
Online evaluation of LLM applications in production
Generate evaluation-based summary reports for relevant stakeholders

Why Confident AI?

As you try to evaluate and monitor LLM applications in both development and production environments, you might face several challenges:

Evaluation and Testing Quality: Running evaluations locally on deepeval is great but often times you will find flaky metric scores when using your own model of choice for evaluation. By running evaluations on Confident AI's infrastructure you get the latest metrics implementation with the best evaluation models available for each particular metric.
Dataset Quality Assurance: Keeping track of which test cases are ready for evaluation can become cumbersome, and miscommunication between expert data annotators and engineers regarding test case specifics can lead to inefficiencies.
Experimentation Difficulties: Finding an easy way to experiment with the best LLM system implementations is essential but often challenging and unintuitive.
Identifying Issues at Scale: Spotting unsatisfactory responses in production at scale can be daunting, especially for a complex LLM system architecture.

Here's a diagram outlining how Confident AI works:

Everything in deepeval is already automatically integrated with Confident AI, including deepeval's custom metrics. To start using Confident AI with deepeval, simply login in the CLI:

deepeval login

Follow the instructions displayed on the CLI (to create an account, get your Confident API key, paste it in the CLI), and you're good to go.

tip

You can also login directly in Python if you already have a Confident API Key:

deepeval.login_with_confident_api_key("your-confident-api-key")

Or, via the CLI:

deepeval login --confident-api-key "your-confident-api-key"

Setting Up Your Evaluation Model

If you choose to run evaluations on Confident AI, you'll need to specify which evaluation model to use when running LLM-as-a-judge metrics. You'll need to configure an evaluation model if you're using any of these features:

To setup an evaluation model, you can either:

Use OpenAI by supplying an OpenAI api key.
Use a custom endpoint as your evaluation model.

Using Custom LLMs

To use a custom LLM, go to Project Settings > Evaluation Model, and select the "Custom" option under the model provider dropdown. You'll have the opportunity to supply a model "Name", as well as an "Endpoint. The name is straightforward, since it is merely for identification purposes, but for the model endpoint, you'll have to ensure it follows the following rules:

It MUST accept a POST request over HTTPS, and must be reachable over the internet.
It MUST return a 200 status where the response body is a simple string (not Json, not nothing else, but string only) that represents the generated text of your evaluation model based on the given request body (which contains the input).
It MUST correctly parse the request body to extract the input and use your evaluation model to generate a response based on this input.

Here is what the structure of the request body looks like:

{
  "input": "Given the metric score, tell me why..."
}

Please test your endpoint that it meets all the given requirements before continuing. This might be harder to debug than other steps, so you should aim to log any errors and reach out to support@confident-ai.com for any errors during setup.

tip

If you find a model provider that is not support, most of the time it is because no one has asked for it. Simply email support@confident-ai.com as well if that is the case.

Introduction​

Why Confident AI?​

Login to Confident AI​

Setting Up Your Evaluation Model​

Using Custom LLMs​

Introduction

Why Confident AI?

Login to Confident AI

Setting Up Your Evaluation Model

Using Custom LLMs