Cybersecurity Guard
The Cybersecurity Guard is a versatile guard that analyzes both inputs and outputs of your language model application to detect potential cybersecurity threats. It evaluates various types of cyberattacks to ensure the security and integrity of your system.
CybersecurityGuard can function as both an input guard and an output guard.
Here's what an unsafe input would look like in the context of a cybersecurity breach:
"SELECT * FROM users WHERE '1'='1'; --"
— probably a random user
Example
Since TopicalGuard
is a input guard, simply provide it as a guard in the list of guards
when initializing a Guardrails
object:
from deepeval.guardrails import Guardrails, CybersecurityGuard
from deepeval.guardrails.cybersecurity_guard import (
CyberattackCategory
)
# Initialize the CybersecurityGuard
cybersecurity_guard = CybersecurityGuard(
purpose="You're a medical chatbot.",
categories=[
CyberattackCategory.SQL_INJECTION,
CyberattackCategory.SHELL_INJECTION,
],
)
guardrails = Guardrails(guards=[cybersecurity_guard])
There is 2 required parameter when creating a CybersecurityGuard
:
purpose
: A string describing the purpose of your LLM application.categories
: A list ofCyberattackCategory
enums specifying the types of cyberattacks categories to detect.
Then, call the guard_input
method to make use of the CybersecurityGuard
:
...
guard_result = guardrails.guard_input(input=input)
print(guard_result)
Since the CybersecurityGuard
is also an output guard, you can also use the guard_output
method:
...
output = generate_output(input)
guard_result = guardrails.guard_output(input=input, output=output)
print(guard_result)
The returned guard_result
is of type GuardResult
, which you can use to control downstream application logic (such as returning a default error message to users):
...
print(guard_result.breached, guard_result.guard_data)
How Is it Calculated?
The final guard score (which ultimately determines a breach) is calculated according to the following equation:
This formula returns 1 if the input does not match any of the allowed topics, indicating a topic breach. It returns 0 if the input matches at least one allowed topic, indicating no breach.