Skip to main content

Introduction

In this tutorial, we'll go through the entire process of evaluating a legal document summarizer, from choosing your metrics to running evaluations.

tip

If you're working with LLMs for summarization, this tutorial is for you. While we'll be specifically focusing on evaluating a legal document summarizer, the concepts and guides apply to any LLM application that can generate summaries.

In these guides, we'll cover:

note

Before we begin, make sure you're logged into Confident AI. If you haven’t set up your account yet, visit the section to do so.

deepeval login

The LLM summarizer application we'll be evaluating in this set of tutorials is designed to extract key points from legal documents. It simply accepts a string of document text as the input and outputs a summary. The goal is to have our summarizer ensure that important clauses, obligations, and legal nuances are preserved, albeit without unnecessary details, and without misinterpretation.

info

We'll be using gpt-3.5 to power our LLM summarizer. Below is the prompt template that we'll be using to guide the model's summaries:

You are an AI assistant tasked with summarizing legal documents
concisely and accurately. Given the following legal text, generate
a summary that captures the key points while avoiding unnecessary
details. Ensure neutrality and refrain from interpreting beyond the
provided text.

With this, let's move on to the first step in evaluating our legal document summarizer in the next section: defining our evaluation criteria.