Managing Datasets
Quick Summary
Confident AI provides your team a centralized place to create and edit evaluation datasets. You can manage evaluation datasets either using deepeval
or directly on Confident AI.
Create Your Dataset Using DeepEval
Creating an EvaluationDataset
on Confident using deepeval
is a two-step process:
- Create a dataset locally (same as how you would create a dataset as shown in the datasets section)
- Push the created dataset to Confident
Create A Dataset Locally
from deepeval.test_case import LLMTestCase
from deepeval.dataset import EvaluationDataset
original_dataset = [
{
"input": "What are your operating hours?",
"actual_output": "...",
"context": [
"Our company operates from 10 AM to 6 PM, Monday to Friday.",
"We are closed on weekends and public holidays.",
"Our customer service is available 24/7.",
],
},
{
"input": "Do you offer free shipping?",
"actual_output": "...",
"expected_output": "Yes, we offer free shipping on orders over $50.",
},
{
"input": "What is your return policy?",
"actual_output": "...",
},
]
test_cases = []
for datapoint in original_dataset:
input = datapoint.get("input", None)
actual_output = datapoint.get("actual_output", None)
expected_output = datapoint.get("expected_output", None)
context = datapoint.get("context", None)
test_case = LLMTestCase(
input=input,
actual_output=actual_output,
expected_output=expected_output,
context=context
)
test_cases.append(test_case)
dataset = EvaluationDataset(test_cases=test_cases)
Push Dataset to Confident AI
After creating your EvaluationDataset
, all you have to do is push it to Confident by providing an alias
as an unique identifier:
# Provide an alias when pushing a dataset
dataset.push(alias="My Confident Dataset")
You can choose to overwrite or append to an existing dataset if an existing dataset with the same alias already exist.
dataset.push(alias="My Confident Dataset", overwrite=False)
deepeval
will prompt you in the terminal if no value for overwrite
is provided.
Create Your Dataset on Confident AI
You can alternatively create an evaluation dataset directly on Confident.
- Login to Confident.
- Find "Datasets" on the left navigation bar to go to the "Datasets" page.
- Click on the "Create Dataset" button.
- Enter an "alias", and click "Create". Click on the newly created dataset to create your first golden.
- In the "Edit Dataset" page, click on the "Actions" button.
- On the dropdown menu, click on the "Create Golden" button.
- Create your "Golden", click "Save", and repeat steps 5-7 until you're done building your dataset.
What is a Golden?
A "Golden" is what makes up an evaluation dataset and is very similar to a test case in deepeval
, but they:
- do not require an
actual_output
, so whilst test cases are always ready for evaluation, a golden isn't. - only exists within an
EvaluationDataset()
, while test cases can be defined anywhere. - contains an extra
additional_metadata
field, which is a dictionary you can define on Confident. Allows you to do some extra preprocessing on your dataset (eg., generating a custom LLMactual_output
based on some variables inadditional_metadata
) before evaluation.
We introduced the concept of goldens because it allows you to create evaluation datasets on Confident without needing pre-computed actual_output
s. This is especially helpful if you are looking to generate responses from your LLM application at evaluation time.