Reference

AI Security Glossary

20 essential terms for security engineers and CISOs evaluating AI coding tool risk. From code mutation to zero-trust AI, this glossary covers the vocabulary you need to make informed decisions about developer AI tool security.

20 terms defined5 categories coveredReal-world examples for each term
Mutation
6 terms

Byte-Exact Reversal

Mutation

Restoration of mutated code to its exact original form using a stored mutation map, with no byte-level differences from the pre-mutation source. Byte-exact reversal guarantees that the developer receives working code indistinguishable from what they would have written directly. It requires that every mutation be deterministic and fully recorded.

Example

After Claude Code suggests a refactor, Pretense applies byte-exact reversal so the response contains the original function names, class names, and variable names the developer used.

Code Mutation

Mutation

The process of deterministically transforming proprietary code identifiers into synthetic equivalents before transmission to AI models. Code mutation preserves code structure, syntax, and logical relationships while replacing the names that make a codebase proprietary. Unlike redaction, mutation allows LLMs to reason fully about the code and return high-quality completions.

Example

Pretense mutates getUserPaymentToken to _fn4a2b before sending to Claude, preventing Anthropic from learning your authentication architecture.

Deterministic Mutation

Mutation

A mutation algorithm where the same input identifier always produces the same synthetic output token, enabling reliable reversal without storing every substitution individually. Deterministic mutation uses a hash function seeded by the identifier string, ensuring consistency across sessions and files. The output is repeatable and auditable.

Example

Pretense hashes processPayment using a 4-character hex digest to always produce _fn3a2b, so every file in a project that references processPayment receives the same synthetic token.

Identifier Mutation

Mutation

The specific technique of replacing function names, variable names, and class names with hash-derived synthetic tokens while leaving the structural elements of code (brackets, operators, keywords, types) intact. Identifier mutation targets the layer of code that is most proprietary and least necessary for an LLM to understand in its literal form. The LLM can reason about _fn4a2b the same way it reasons about getUserToken because both are valid identifiers in the same syntactic position.

Example

Pretense performs identifier mutation on a TypeScript file, converting verifyJwtClaims to _fn7b3c and AuthService to _cls2d9e before the file is included in an AI prompt.

Mutation Map

Mutation

A persistent mapping between original identifiers and their mutated synthetic equivalents, stored locally and used to reverse mutations in AI responses. The mutation map is the foundation of byte-exact reversal and must be stored securely since it contains the original proprietary names. Pretense stores mutation maps in a local SQLite database with WAL mode for durability.

Example

Pretense's mutation map records that processStripeWebhook maps to _fn8c4a, enabling it to restore the original name when Claude returns a suggestion using the synthetic identifier.

Semantic Preservation

Mutation

The property of code mutation where AI models can still reason about structure, logic, and relationships despite identifier changes. Semantic preservation depends on the fact that LLMs encode semantic meaning from syntactic position and structural patterns rather than memorized identifier names. A function called _fn4a2b in a payment processing context carries the same implicit meaning as getUserPaymentToken to a sufficiently capable LLM.

Example

After Pretense mutates a database query module, GPT-4 correctly suggests optimizations to the query pattern because the structural semantics are preserved, even though all table and column names are synthetic.

Detection
2 terms

Secret Scanning

Detection

Regex-based detection of API keys, credentials, tokens, and PII in source code or prompts before transmission to AI models. Secret scanning uses pattern libraries covering 30 or more secret types including AWS access keys, Stripe keys, GitHub tokens, JWTs, and database connection strings. It is a complement to code mutation: mutation protects identifier-level IP, while secret scanning blocks credential-level exposure.

Example

Pretense scans an outbound prompt and detects a Stripe secret key matching sk_live_[0-9a-zA-Z]{24}, blocking the request and returning a clear error to the developer.

Token Scanning

Detection

Lexical analysis that identifies code tokens (identifiers, literals, keywords) suitable for mutation or inspection before AI transmission. Token scanning operates at a level below abstract syntax tree (AST) analysis, parsing raw source text to extract named tokens across multiple programming languages. Pretense's token scanner supports TypeScript, JavaScript, Python, Go, and Java.

Example

Pretense's token scanner processes a Python file and identifies 47 unique identifiers across 12 classes and 89 functions as candidates for mutation before an AI prompt is constructed.

Compliance
4 terms

AI Audit Log

Compliance

An immutable record of all AI API requests, including what data was sent, mutated, and received. AI audit logs are essential for SOC2 and HIPAA compliance frameworks that require proof of data handling practices. They capture mutation metadata, timestamps, tool identifiers, and response hashes.

Example

A CISO exports an AI audit log from Pretense to demonstrate that no unprotected code was transmitted to OpenAI APIs during a quarterly SOC2 review.

GDPR AI Compliance

Compliance

Meeting General Data Protection Regulation requirements when using AI tools that process code containing personal data, such as user IDs, email addresses, or behavioral identifiers embedded in source files. GDPR applies when code references, processes, or stores EU resident personal data, including during AI-assisted development. Compliance requires data minimization, purpose limitation, and documented transfer mechanisms.

Example

A Berlin-based engineering team uses Pretense to ensure GDPR-covered user identifiers in their codebase are mutated before reaching OpenAI servers, maintaining data minimization compliance.

HIPAA AI Security

Compliance

Ensuring protected health information (PHI) is not exposed to AI models during code development, particularly when engineers write or review code that handles patient records, diagnostic data, or treatment identifiers. HIPAA applies to business associates including AI vendors, which means using coding assistants with healthcare code may create covered entity obligations. Local-first mutation eliminates this risk by ensuring PHI-adjacent identifiers never reach third-party APIs.

Example

A hospital engineering team configures Pretense to block and mutate any code pattern referencing patient record identifiers before routing requests through GitHub Copilot.

SOC2 AI Compliance

Compliance

Achieving SOC2 Type II certification specifically for AI tool usage and data handling, requiring documented controls around what data is sent to AI providers, how long it is retained, and whether it is encrypted in transit. SOC2 AI compliance is an emerging audit category as organizations formalize policies around developer AI tool use. Pretense provides exportable audit logs and mutation reports to support SOC2 evidence collection.

Example

During a SOC2 Type II audit, an engineering organization exports Pretense's mutation audit log to demonstrate that no source code identifiers were transmitted to AI APIs without mutation.

Architecture
4 terms

Differential Privacy (AI Context)

Architecture

Techniques that prevent individual data points from being identifiable in AI training datasets or inference logs. In the context of AI coding tools, differential privacy mechanisms add mathematical noise or transform data so that individual code patterns cannot be attributed to a specific organization. It is a theoretical framework; in practice, code mutation provides a more practical guarantee for proprietary identifiers.

Example

A researcher applies differential privacy bounds to model fine-tuning to ensure no individual developer's code patterns can be reconstructed from the resulting model weights.

LLM Firewall

Architecture

A security layer that inspects and transforms data before transmission to large language model APIs, analogous to a network firewall but operating at the application layer on AI prompt content. An LLM firewall may block requests containing secrets, mutate proprietary identifiers, log all outbound AI traffic, and enforce rate limits or content policies. It sits transparently between the developer tool and the AI provider.

Example

Pretense acts as an LLM firewall, intercepting every Cursor request to Claude and applying mutation before the prompt leaves the developer's machine.

Proxy Interception

Architecture

Routing API traffic through a local proxy to inspect, transform, or block requests before they reach their destination. Proxy interception is transparent to the client application: by setting a base URL environment variable, all SDK calls are redirected through the proxy without code changes. This architecture enables Pretense to intercept any OpenAI-compatible or Anthropic API call regardless of which tool generated it.

Example

Setting ANTHROPIC_BASE_URL=http://localhost:9339 causes Claude Code to route all API calls through the Pretense proxy, which applies mutation before forwarding to api.anthropic.com.

Zero-Trust AI

Architecture

A security model requiring that no AI tool or API is trusted with raw proprietary code, applying the zero-trust principle (never trust, always verify) to AI coding assistant integrations. Zero-trust AI mandates that all code sent to AI providers be inspected and transformed before transmission, regardless of the provider's stated data policies or contractual assurances. Trust is replaced with cryptographic guarantees and local-first processing.

Example

An enterprise CISO adopts a zero-trust AI policy requiring all developer AI tool usage to route through Pretense, ensuring no raw source code reaches any external AI API regardless of vendor.

Attack Vectors
4 terms

AI Data Leakage

Attack Vectors

The unintentional exposure of proprietary code, business logic, or secrets to AI model training pipelines or cloud inference endpoints. AI data leakage occurs when developers use coding assistants without realizing that context windows are transmitted to third-party servers. The leaked data may be retained, logged, or incorporated into future model training.

Example

A developer uses GitHub Copilot to refactor a payment service and inadvertently transmits a Stripe secret key and proprietary pricing algorithm to Microsoft servers.

Context Window Pollution

Attack Vectors

The unintentional injection of sensitive data into LLM context windows via code completion tools, file-aware agents, or open-ended prompts. Context window pollution is difficult to detect because the sensitive data is embedded in normal-looking coding requests. It commonly occurs when IDE extensions send surrounding file content as context without filtering.

Example

An AI coding agent automatically includes a .env file in its context window when analyzing a project, sending database credentials alongside the code query.

Model Training Contamination

Attack Vectors

The risk that code submitted to AI APIs is incorporated into future model training data, allowing the provider or third parties to reconstruct proprietary algorithms from model outputs. Major AI providers have varying data retention and training policies; some opt out mechanisms are unreliable or apply only to certain product tiers. Even without deliberate training, code submitted to inference endpoints may be logged and retained.

Example

A startup discovers its novel sorting algorithm appears in a competitor's AI-generated code three months after submitting it unprotected to ChatGPT for debugging help.

Supply Chain AI Risk

Attack Vectors

The risk that third-party AI coding tools create vulnerabilities by exposing internal code to external parties, training data pipelines, or adversarial actors targeting AI provider infrastructure. Supply chain AI risk extends the concept of software supply chain security to the AI layer: any tool that sends your code to a third-party service is a potential supply chain exposure point. Attackers may target AI providers specifically to harvest code from high-value targets.

Example

A financial services firm assesses supply chain AI risk when evaluating GitHub Copilot, determining that transmitted code could expose proprietary risk models if the AI provider's infrastructure were compromised.

Put These Concepts into Practice

Pretense implements code mutation, byte-exact reversal, secret scanning, and AI audit logging in a single local proxy. Set it up in 5 minutes.

No credit card required. Local-first, nothing leaves your machine.

Ask me anything