Skip to main content


The RAGAS metric is the average of four distinct metrics:

  • RAGASAnswerRelevancyMetric
  • RAGASFaithfulnessMetric
  • RAGASContextualPrecisionMetric
  • RAGASContextualRecallMetric

It provides a score to holistically evaluate of your RAG pipeline's generator and retriever.

Required Arguments

To use the RagasMetric, you'll have to provide the following arguments when creating an LLMTestCase:

  • input
  • actual_output
  • expected_output
  • retrieval_context


from deepeval import evaluate
from deepeval.metrics.ragas import RagasMetric
from deepeval.test_case import LLMTestCase

# Replace this with the actual output from your LLM application
actual_output = "We offer a 30-day full refund at no extra cost."

# Replace this with the expected output from your RAG generator
expected_output = "You are eligible for a 30 day full refund at no extra cost."

# Replace this with the actual retrieved context from your RAG pipeline
retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]

metric = RagasMetric(threshold=0.5, model="gpt-3.5-turbo")
test_case = LLMTestCase(
input="What if these shoes don't fit?",


# or evaluate test cases in bulk
evaluate([test_case], [metric])

There are three optional parameters when creating a RagasMetric:

  • [Optional] threshold: a float representing the minimum passing threshold, defaulted to 0.5.
  • [Optional] model: a string specifying which of OpenAI's GPT models to use, OR any one of langchain's chat models of type BaseChatModel. Defaulted to 'gpt-3.5-turbo'.

You can also choose to import and evaluate using each metric individually:

from deepeval.metrics.ragas import RAGASAnswerRelevancyMetric
from deepeval.metrics.ragas import RAGASFaithfulnessMetric
from deepeval.metrics.ragas import RAGASContextualRecallMetric
from deepeval.metrics.ragas import RAGASContextualPrecisionMetric

These metrics accept the same arguments as the RagasMetric.