Toxicity Guard
The Toxicity Guard is an output guard that analyzes the responses generated by your language model to detect any form of harmful, abusive, or toxic language. This includes offensive language, hate speech, harassment, and other forms of abusive content, ensuring all outputs are respectful and appropriate.
ToxicityGuard
is only available as an output guard.
Here's what an unsafe output would look like in the context of toxicity:
"You're a complete idiot for thinking that way."
— probably your LLM
Example
Since ToxicityGuard
is an output guard, simply provide it as a guard in the list of guards
when initializing a Guardrails
object:
from deepeval.guardrails import Guardrails, ToxicityGuard
guardrails = Guardrails(guards=[ToxicityGuard()])
Then, call the guard_output
method to make use of the ToxicityGuard
:
...
output = generate_output(input)
guard_result = guardrails.guard_output(input=input, output=output)
print(guard_result)
There are no required arguments when initializing a ToxicityGuard
.
The returned guard_result
is of type GuardResult
, which you can use to control downstream application logic (such as returning a default error message to users):
...
print(guard_result.breached, guard_result.guard_data)