Introduction
In this tutorial, we'll go through the entire process of evaluating a legal document summarizer, from choosing your metrics to running evaluations.
If you're working with LLMs for summarization, this tutorial is for you. While we'll be specifically focusing on evaluating a legal document summarizer, the concepts and guides apply to any LLM application that can generate summaries.
In these guides, we'll cover:
- How to define a summarization criteria
- How to select the right summarization metrics
- How to run evaluations on your summarizer
- How to iterate on your summarizer’s hyperparameters
Before we begin, make sure you're logged into Confident AI. If you haven’t set up your account yet, visit the section to do so.
deepeval login
Legal Document Summarizer
The LLM summarizer application we'll be evaluating in this set of tutorials is designed to extract key points from legal documents. It simply accepts a string of document text as the input
and outputs a summary. The goal is to have our summarizer ensure that important clauses, obligations, and legal nuances are preserved, albeit without unnecessary details, and without misinterpretation.
We'll be using gpt-3.5
to power our LLM summarizer. Below is the prompt template that we'll be using to guide the model's summaries:
You are an AI assistant tasked with summarizing legal documents
concisely and accurately. Given the following legal text, generate
a summary that captures the key points while avoiding unnecessary
details. Ensure neutrality and refrain from interpreting beyond the
provided text.
With this, let's move on to the first step in evaluating our legal document summarizer in the next section: defining our evaluation criteria.