Selecting the Right Summarization Metrics
Having a clear evaluation criteria makes selecting the right summarization metrics easy. This is because DeepEval's GEval
metric allows you to define custom summarization metrics simply by providing your evaluation criteria.
If you've followed the steps in the previous section, creating a custom summarization metric is as pasting in or rewording your summarization criteria.
For example, if your goal is to generate concise summaries, your evaluation criteria might assess whether the summary remains concise while preserving all essential information from the document. Here's how you can create a custom Concision GEval
metric in DeepEval
:
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams
concision_metric = GEval(
name="Concision",
criteria="Assess if the actual output remains concise while preserving all essential information.",
evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT],
)
When defining GEval
, you must provide evaluation_params
, which specify which test case parameters the metric should evaluate. To learn more about GEval
, visit this section.
Selecting Metrics from Evaluation Criteria
In order to begin selecting the right metrics for our legal document summarizer, we'll first need to revisit the evaluation criteria we defined in the previous section:
- The summary must be concise.
- The summary must be complete.
The first criterion specifies that the summary must be concise. Fortunately, we've already defined a Concision metric in the section above, which we can immediately put to use. The second criterion states that the summary must be complete, ensuring no important information is lost from the original document.
While a brief evaluation criteria (e.g., "the actual output must be concise") is acceptable, it's generally better to be more specific when defining such criteria for GEval
, which is a LLM-Judge metric. This ensures that metric scores remain consistent across evaluation runs, given the non-deterministic nature of LLMs.
Defining Completeness Metric
For example, when defining your completeness metric, instead of saying "the summary must be complete," a better and more specific alternative is "the summary must retain all important information from the document." Here's a GEval
metric for Completeness using the criteria we've just re-defined:
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams
completeness_metric = GEval(
name="Completeness",
criteria="Assess whether the actual output retains all key information from the input.",
evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
)
LLMTestCaseParams.INPUT
refers to your document (input to your LLM summarizer), while LLMTestCaseParams.ACTUAL_OUTPUT
is the summary (generated output). More information about LLMTestCaseParams
is available here.
Conclusion
With our two metrics for concision and completeness defined, let's begin running evaluations in the next section.
DeepEval offers a built-in Summarization
metric. Other useful summarization metrics you can define with GEval
include:
- Fluency – Ensures the summary is grammatically correct and natural.
- Coherence – Checks if the summary flows logically and maintains readability.
- Hallucination – Identifies incorrect or fabricated information.
- Factual Consistency – Ensures the summary accurately reflects the source content.