Skip to main content

Maintaining a Dataset

In the previous section, we successfully passed all test cases and generated summaries that aligned with our evaluation on a set of five documents. However, five documents are insufficient for a robust evaluation. To ensure reliability, you'll need to maintain a larger dataset.

Creating a Dataset

You can easily create a dataset on Confident AI from a test run you've already completed, and start building from the dataset of the five documents we ran evaluations on. This means creating a dataset is as simple as a clicking save as new dataset and giving it a unique name.


You can also create a dataset by uploading a CSV file or by creating goldens from scratch. Goldens are test cases where actual_outputs haven't been populated yet. They make up the golden dataset, and you can learn more about them here.

You'll notice that only the inputs corresponding to your document texts are populated. This allows you to generate different summaries (actual_output) for each iteration of your summarizer.

Maintaining the Dataset

Building a dataset is no easy task, especially if you're building a domain-specific legal document summarizer. Very likely, you'll be adding reference metrics (metrics that require a ground truth expected_output), which means you'll need legal experts to populate what constitutes an ideal summary.


Your domain experts can use Confident AI to add, edit, annotate, and comment on test cases while building them, as well as mark whether each test case is finalized and ready for evaluation.

Once you have a complete dataset of documents, you can pull it into your code with just two lines to run evaluations, which we'll be doing in the next section.