Skip to main content

Hallucination

The hallucination metric determines whether your LLM generates factually correct information by comparing the actual_output to the provided context.

info

If you're looking to evaluate hallucination for a RAG system, please refer to the faithfulness metric instead.

Required Arguments

To use the HallucinationMetric, you'll have to provide the following arguments when creating an LLMTestCase:

  • input
  • actual_output
  • context
note

Remember, input and actual_output are mandatory arguments to an LLMTestCase and so are always required even if not used for evaluation.

Example

from deepeval import evaluate
from deepeval.metrics import HallucinationMetric
from deepeval.test_case import LLMTestCase

# Replace this with the actual documents that you are passing as input to your LLM.
context=["A man with blond-hair, and a brown shirt drinking out of a public water fountain."]

# Replace this with the actual output from your LLM application
actual_output="A blond drinking water in public."

test_case = LLMTestCase(
input="What was the blond doing?",
actual_output=actual_output,
context=context
)
metric = HallucinationMetric(threshold=0.5)

metric.measure(test_case)
print(metric.score)

# or evaluate test cases in bulk
evaluate([test_case], [metric])

There are four optional parameters when creating a HallucinationMetric:

  • [Optional] threshold: a float representing the maximum passing threshold, defaulted to 0.5.
  • [Optional] model: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of type DeepEvalBaseLLM. Defaulted to 'gpt-4-1106-preview'.
  • [Optional] include_reason: a boolean which when set to True, will include a reason for its evaluation score. Defaulted to True.
  • [Optional] multithreading: a boolean which when set to True, enables concurrent evaluation of said metric. Defaulted to True.