Skip to main content

Adding Responses to Evaluation Datasets

Your evaluation dataset that was prepared during development can cover only so much ground. That's why it's crucial to continually enhance this dataset for future iterations to improve your LLM system. One approach is to keep manually curating test cases; another is to generate additional synthetic data.

However, arguably the most important method is to create test cases based on real user interactions. This means identify your LLM's failing responses and incorporating these responses into your evaluation dataset.

info

Confident AI allows you to easily add failing datasets to your evaluation datasets in bulk.

Adding Responses to Datasets

1. Select Responses

To select responses for your evaluation dataset, simply check the checkboxes to the left of each response. You may find it helpful to first filter for responses that meet your criteria for failure (such as a low feedback rating), then checking the checkbox to select all filtered responses. After selecting the responses, click on the Add to Dataset button.

ok

2. Add To Dataset

You will be prompted to select the evaluation dataset to which you want to add the failing responses. Click Add, to add the response(s) to your dataset!

ok