Role Adherence
The role adherence metric is a conversational metric that determines whether your LLM chatbot is able to adhere to its given role throughout a conversation.
The RoleAdherenceMetric
is particular useful for a role-playing usecase.
Required Arguments
To use the RoleAdherenceMetric
, you'll have to provide the following arguments when creating a ConversationalTestCase
:
turns
chatbot_role
Additionally, each LLMTestCase
s in turns
requires the following arguments:
input
actual_output
Example
Let's take this conversation as an example:
from deepeval.test_case import LLMTestCase, ConversationalTestCase
from deepeval.metrics import RoleAdherenceMetric
convo_test_case = ConversationalTestCase(
chatbot_role="...",
turns=[LLMTestCase(input="...", actual_output="...")]
)
metric = RoleAdherenceMetric(threshold=0.5)
metric.measure(convo_test_case)
print(metric.score)
print(metric.reason)
There are six optional parameters when creating a RoleAdherenceMetric
:
- [Optional]
threshold
: a float representing the minimum passing threshold, defaulted to 0.5. - [Optional]
model
: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of typeDeepEvalBaseLLM
. Defaulted to 'gpt-4o'. - [Optional]
include_reason
: a boolean which when set toTrue
, will include a reason for its evaluation score. Defaulted toTrue
. - [Optional]
strict_mode
: a boolean which when set toTrue
, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted toFalse
. - [Optional]
async_mode
: a boolean which when set toTrue
, enables concurrent execution within themeasure()
method. Defaulted toTrue
. - [Optional]
verbose_mode
: a boolean which when set toTrue
, prints the intermediate steps used to calculate said metric to the console, as outlined in the How Is It Calculated section. Defaulted toFalse
.
How Is It Calculated?
The RoleAdherenceMetric
score is calculated according to the following equation:
The RoleAdherenceMetric
first loops through each turn individually before using an LLM to determine which one of them does not adhere to the specified chatbot_role
using previous turns as context.