๐Ÿ“ฆ Elgenzay / llm-scanner

A job application assignment

โ˜… 2 stars โ‘‚ 0 forks ๐Ÿ‘ 2 watching โš–๏ธ BSD Zero Clause License
๐Ÿ“ฅ Clone https://github.com/Elgenzay/llm-scanner.git
HTTPS git clone https://github.com/Elgenzay/llm-scanner.git
SSH git clone git@github.com:Elgenzay/llm-scanner.git
CLI gh repo clone Elgenzay/llm-scanner
Jason Jason Add Unknown variant on error 343bdbb 2 months ago ๐Ÿ“ History
๐Ÿ“‚ main View all commits โ†’
๐Ÿ“ data
๐Ÿ“ defaults
๐Ÿ“ output
๐Ÿ“ src
๐Ÿ“„ .gitignore
๐Ÿ“„ Cargo.lock
๐Ÿ“„ Cargo.toml
๐Ÿ“„ LICENSE
๐Ÿ“„ README.md
๐Ÿ“„ README.md

LLM Scanner

An LLM jailbreak scanner with configurable detection methods.

Requires VHACK running at http://localhost:8000 (if mock_mode = false)

Quick Start

# Clone and build
git clone git@github.com:Elgenzay/llm-scanner.git
cd llm-scanner

# Run with default settings (mock mode enabled)
cargo run

Configuration

On first run, the scanner automatically generates:

  • config.toml - Main configuration file
  • data/ - Directory containing all data files
All configuration options can be overridden via command line:
cargo run -- \
  --target http://localhost:8000/api/chat \
  --prompts data/prompts.csv \
  --concurrency 4 \
  --timeout-ms 30000 \
  --out report.jsonl \
  --detection-method llm \
  --mock-mode false

Data Files (data/)

All files are auto-generated with defaults on first run and can be customized:

  • prompts.csv - List of prompts to test (one per line)
  • safe_patterns.csv - Patterns indicating safe/refused responses
  • unsafe_patterns.csv - Patterns indicating jailbreak success
  • test_responses.csv - Mock responses for testing (used when mock_mode = true)
  • judge_prompt.md - Prompt template for LLM-based evaluation
Mock Mode Behavior: Responses in data/test_responses.csv correspond by line number to prompts in data/prompts.csv. The first prompt gets the first response, etc.

Detection Methods

Pattern

Fast keyword matching using safe and unsafe pattern files. Defaults to safe if no patterns match.

LLM

Uses the target LLM itself as a judge to evaluate if responses contain jailbreaks. Reads evaluation criteria from data/judge_prompt.md.

Hybrid (Default)

Attempts pattern matching first. If no pattern matches (neither safe nor unsafe), falls back to LLM evaluation.

Output

The scanner generates two output files to output/:

  • report.jsonl - Results in JSONL format
  • summary.html - HTML summary