Multimodal Faithfulness
The multimodal faithfulness metric measures the quality of your RAG pipeline's generator by evaluating whether the actual_output
factually aligns with the contents of your retrieval_context
. deepeval
's multimodal faithfulness metric is a self-explaining MLLM-Eval, meaning it outputs a reason for its metric score.
The Multimodal Faithfulness is the multimodal adaptation of DeepEval's faithfulness metric. It accepts images in addition to text for the input
, actual_output
, and retrieval_context
.
Required Arguments
To use the MultimodalFaithfulnessMetric
, you'll have to provide the following arguments when creating a MLLMTestCase
:
input
actual_output
retrieval_context
Example
from deepeval import evaluate
from deepeval.metrics import MultimodalFaithfulnessMetric
from deepeval.test_case import MLLMTestCase, MLLMImage
metric = MultimodalFaithfulnessMetric()
test_case = MLLMTestCase(
input=["Tell me about some landmarks in France"],
actual_output=[
"France is home to iconic landmarks like the Eiffel Tower in Paris.",
MLLMImage(...)
],
retrieval_context=[
MLLMImage(...),
"The Eiffel Tower is a wrought-iron lattice tower built in the late 19th century.",
MLLMImage(...)
]
)
metric.measure(test_case)
print(metric.score)
print(metric.reason)
# or evaluate test cases in bulk
evaluate([test_case], [metric])
There are seven optional parameters when creating a MultimodalFaithfulnessMetric
:
- [Optional]
threshold
: a float representing the minimum passing threshold, defaulted to 0.5. - [Optional]
model
: a string specifying which of OpenAI's Multimodal GPT models to use, OR any custom MLLM model of typeDeepEvalBaseMLLM
. Defaulted to 'gpt-4o'. - [Optional]
include_reason
: a boolean which when set toTrue
, will include a reason for its evaluation score. Defaulted toTrue
. - [Optional]
strict_mode
: a boolean which when set toTrue
, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted toFalse
. - [Optional]
async_mode
: a boolean which when set toTrue
, enables concurrent execution within themeasure()
method. Defaulted toTrue
. - [Optional]
verbose_mode
: a boolean which when set toTrue
, prints the intermediate steps used to calculate said metric to the console, as outlined in the How Is It Calculated section. Defaulted toFalse
. - [Optional]
truths_extraction_limit
: an int which when set, determines the maximum number of factual truths to extract from theretrieval_context
. The truths extracted will used to determine the degree of factual alignment, and will be ordered by importance, decided by your evaluationmodel
. Defaulted toNone
.
How Is It Calculated?
The MultimodalFaithfulnessMetric
score is calculated according to the following equation:
The MultimodalFaithfulnessMetric
first uses an MLLM to extract all claims made in the actual_output
(including from images), before using the same MLLM to classify whether each claim is truthful based on the facts presented in the retrieval_context
.
A claim is considered truthful if it does not contradict any facts presented in the retrieval_context
.
Sometimes, you may want to only consider the most important factual truths in the retrieval_context
. If this is the case, you can choose to set the truths_extraction_limit
parameter to limit the maximum number of truths to consider during evaluation.