BasicAgent

OWASP LLM

OWASP LLM — How to align LLM security controls with OWASP guidance and keep evidence for audits.

This page defines an operational control model for LLM systems using only mechanisms that are supported by the research in this repo: structured prompting, schema-bound outputs, validation loops, and audit-ready evidence. The core idea is to turn high-level security goals into repeatable prompt and tool controls that produce verifiable artifacts. The approach uses explicit role/instruction prompts, constrained JSON output, and validation gates (RAV/RAC) to reduce ambiguity and preserve traceability. You can map these controls to any external governance framework, but this page only includes methods and claims grounded in the repo.

Logical Framework

Core concepts

  • Control objective: a precise requirement you want the model to satisfy (e.g., extract only supported fields, attach evidence, return confidence).
  • Prompt control: system and instruction prompts that define role, scope, and output constraints.
  • Tool control: schema templates and validators that restrict output shape and enforce evidence capture.
  • Evidence: location references, confidence scores, and validation results recorded with each output.

Taxonomy of controls

  1. Instruction controls: role persona + extraction rules and prohibited behaviors.
  2. Structure controls: required JSON schema, NOT_FOUND placeholders, and ordered fields.
  3. Validation controls: RAV/RAC checks for completeness and consistency.
  4. Traceability controls: location references and confidence scoring for audit trails.

Repeatable workflow

Inputs: document(s), target schema, control objectives, allowed sources. Steps:

  1. Define the schema and role persona in a system prompt.
  2. Encode extraction rules and output constraints in instructions.
  3. Run multi-step extraction (IRZ-CoT) with explicit field requirements.
  4. Validate with RAV/RAC and enforce NOT_FOUND for missing fields.
  5. Emit JSON with confidence and location references. Outputs: schema-compliant JSON, validation results, and traceable evidence. Checkpoints: schema validation pass, confidence thresholds met, location references present.

Practical guidance and guardrails

Do

  • require a strict JSON schema and explicit NOT_FOUND for missing data
  • include confidence scores and location references for each extracted field
  • run validation checks (RAV/RAC) before accepting outputs
  • separate role persona (system prompt) from extraction rules (instructions)

Don't

  • accept free-form output when a schema is required
  • allow missing fields without explicit NOT_FOUND markers
  • skip validation gates or ignore confidence signals

Failure modes and mitigations

  • Hallucinated fields: mitigate with schema enforcement and NOT_FOUND placeholders.
  • Inconsistent extraction: mitigate with IRZ-CoT step structure and deterministic output rules.
  • Missing evidence: mitigate with mandatory location references and confidence per field.

Evaluation criteria

  • Schema compliance: output matches required fields and types.
  • Evidence coverage: each field includes a location reference or NOT_FOUND.
  • Validation pass rate: RAV/RAC checks succeed consistently.
  • Confidence profile: confidence scores are recorded and reviewed for low-trust fields.
  • Traceability: outputs are reproducible and auditable with stored evidence.

Example prompt pattern

The repo includes an IRZ-CoT extractor prompt that uses a role persona, explicit extraction rules, and a strict JSON schema. The pattern below mirrors that structure:

Role: You are an expert extraction assistant.
Instructions:
- Extract only the schema fields.
- If a field is missing, return NOT_FOUND.
- Provide confidence and location references for each field.
Output format: JSON only, no extra text.

Create account

Create account