Cybersecurity Guard

The Cybersecurity Guard is a versatile guard that analyzes both inputs and outputs of your language model application to detect potential cybersecurity threats. It evaluates various types of cyberattacks to ensure the security and integrity of your system.

info

CybersecurityGuard can function as both an input guard and an output guard.

Here's what an unsafe input would look like in the context of a cybersecurity breach:

"SELECT * FROM users WHERE '1'='1'; --"
— probably a random user

Example

Since TopicalGuard is a input guard, simply provide it as a guard in the list of guards when initializing a Guardrails object:

from deepeval.guardrails import Guardrails, CybersecurityGuard
from deepeval.guardrails.cybersecurity_guard import (
    CyberattackCategory
)

# Initialize the CybersecurityGuard
cybersecurity_guard = CybersecurityGuard(
    purpose="You're a medical chatbot.",
    categories=[
        CyberattackCategory.SQL_INJECTION,
        CyberattackCategory.SHELL_INJECTION,
    ],
)

guardrails = Guardrails(guards=[cybersecurity_guard])

There is 2 required parameter when creating a CybersecurityGuard:

purpose: A string describing the purpose of your LLM application.
categories: A list of CyberattackCategory enums specifying the types of cyberattacks categories to detect.

Then, call the guard_input method to make use of the CybersecurityGuard:

...

guard_result = guardrails.guard_input(input=input)
print(guard_result)

Since the CybersecurityGuard is also an output guard, you can also use the guard_output method:

...

output = generate_output(input)

guard_result = guardrails.guard_output(input=input, output=output)
print(guard_result)

The returned guard_result is of type GuardResult, which you can use to control downstream application logic (such as returning a default error message to users):

...

print(guard_result.breached, guard_result.guard_data)

How Is it Calculated?

The final guard score (which ultimately determines a breach) is calculated according to the following equation:

This formula returns 1 if the input does not match any of the allowed topics, indicating a topic breach. It returns 0 if the input matches at least one allowed topic, indicating no breach.

Example​

How Is it Calculated?​

Example

How Is it Calculated?