Defining Metrics in Experiments
Confident AI allows anyone, including non-technical uesrs such as domain experts or human reviewers, to easily define, select and configure metrics on the platform without writing a single line of code by creating experiments on the platform. This includes LLM system metrics, conversational metrics, as well as custom metrics.
An experiment in Confident AI is a collection of metrics that you can use to benchmark your LLM in a contained way. Running an experiment produces a test run, which contains the evaluation results of your LLM app's test cases.
Setting Up
Log in to Confident AI by heading to the platform or running the following command in your CLI.
deepeval login
Creating your Custom Metrics
To create a complete experiment, you'll first need to define your custom metrics, if applicable. In our medical chatbot use-case, we'll be defining 2: Diagnosis Specificity and Overdiagnosis. Start by navigating to the Metrics page, selecting the Custom Metrics tab, and clicking Create Metric.
Specify the custom metric's name, criteria, or evaluation steps (we recommend defining evaluation steps for granular control), and the parameters the metric will use to evaluate your test case. Once you've finalized the details, click Create New Metric.
To learn more about test cases and their parameters in DeepEval, visit this section.
Once you've finished defining all your custom metrics, they'll appear here like this:
Creating an Experiment
Next, head to the Evaluation & Testing page and click create new experiment, where you'll be presented with all the available metrics on DeepEval as well as the custom ones you've defined.
We'll name our experiment Test Medical Chatbot and select all the relevant metrics: 5 RAG metrics, Hallucination, Tool Correctness, as well as our 2 custom metrics (Diagnosis Specificity and Overdiagnosis). Click create experiment.