G-Eval is a custom, LLM evaluated metric. This means its score is calculated using an LLM. G-Eval is the most verstile type of metric
deepeval has to offer, and is capable of evaluating almost any use cases.
To use the
GEval, you'll have to provide the following arguments when creating an
You'll also need to supply any additional arguments such as
context if your evaluation criteria depends on these parameters.
To create a custom metric that uses LLMs for evaluation, simply instantiate an
GEval class and define an evaluation criteria in everyday language:
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams
coherence_metric = GEval(
criteria="Coherence - determine if the actual output is coherent with the input.",
# NOTE: you can only provide either criteria or evaluation_steps, and not both
evaluation_steps=["Check whether the sentences in 'actual output' aligns with that in 'input'"],
There are three mandatory and two optional parameters required when instantiating an
name: name of metric
criteria: a description outlining the specific evaluation aspects for each test case.
evaluation_params: a list of type
LLMTestCaseParams. Include only the parameters that are relevant for evaluation.
evaluation_steps: a list of strings outlining the exact steps the LLM should take for evaluation. You can only provide either
criteria, and not both.
threshold: the passing threshold, defaulted to 0.5.
model: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of type
DeepEvalBaseLLM. Defaulted to 'gpt-4-1106-preview'.
For accurate and valid results, only the parameters that are mentioned in
criteria should be included as a member of
As mentioned in the metrics introduction section, all of
deepeval's metrics return a score ranging from 0 - 1, and a metric is only successful if the evaluation score is equal to or greater than
GEval is no exception. You can access the
reason for each individual
from deepeval.test_case import LLMTestCase
test_case = LLMTestCase(
input="The sun is shining bright today",
actual_output="The weather's getting really hot."
Remember, you can configure
deepeval to use Azure OpenAI for all LLM-based metrics.