The Threat Model Has Changed

Three years ago, the AI coding tool risk question was hypothetical. In 2026, it is an active incident category. GitHub Copilot has 1.8 million paid users. Cursor has 500,000 active developers. Claude Code is embedded in terminal workflows at thousands of engineering organizations. Every one of these tools sends developer context to third-party infrastructure.

The threat model for CISOs is no longer "should we allow AI coding tools." That ship has sailed. The question is "how do we control what leaves our network when developers use them."

What Actually Leaves the Network

Understanding the data flow is the starting point for any control framework.

GitHub Copilot

Copilot sends the current file plus surrounding workspace context (typically the 10-20 most recently edited files) to Microsoft Azure OpenAI Service. With standard Copilot Individual or Team plans, this data may be retained and used for model training. Copilot for Business disables training use but does not prevent processing on Microsoft infrastructure.

Cursor

Cursor sends the files open in the current tab plus any context the developer adds manually. It supports "Privacy Mode" which disables training data use. Without Privacy Mode, the default is data retention.

Claude Code (Anthropic)

Claude Code sends the full conversation context including file contents that the developer or automated hooks add. Anthropic's enterprise terms include zero data retention options. The default plan retains prompts for 30 days.

What this means for your threat model

Any of these tools can transmit: function names and class hierarchies (architecture exposure), variable names derived from business domains (IP exposure), configuration patterns that reveal tech stack, and comments that contain business logic descriptions.

The Control Framework

A practical enterprise AI coding tool security framework has four layers.

Layer 1: Policy

Define what is and is not permissible. A policy without technical enforcement is an audit artifact that provides limited actual protection, but it is required for SOC2 and most enterprise frameworks.

Minimum policy requirements:

- Approved AI coding tools (whitelist specific tools) - Data classification rules (what categories of code may not be sent to AI APIs) - Incident reporting requirements (what to do when a developer accidentally sends sensitive data) - Exception handling (how to get approval for use cases not covered by default policy)

Layer 2: Technical Controls

Policy tells developers what to do. Technical controls prevent violations regardless of developer intent.

The options in order of control strength:

1. Network egress filtering: Block traffic to AI API endpoints at the firewall. Maximum control, eliminates AI coding tools entirely. Not a real option for most organizations in 2026.

2. Approved endpoint routing: Force all AI API traffic through an approved proxy that logs and filters requests. Allows AI tools while creating visibility and enforcement.

3. Local proxy with mutation: Run a local proxy that mutates proprietary identifiers before transit. Full protection without degrading LLM output quality.

4. IDE-level exclusions: Use .copilotignore and equivalent mechanisms to exclude sensitive directories. Low friction but unenforceable at the user level.

Layer 3 (local proxy with mutation) is the only approach that provides both protection and developer productivity. It is also the only approach that generates audit logs that satisfy SOC2 CC6.7 requirements.

Layer 3: Monitoring and Audit

Regardless of which technical controls you implement, you need visibility into what is being sent.

Key metrics to track:

- API calls per developer per day (anomaly detection baseline) - Token volume per request (high token volume may indicate full file uploads) - Detected sensitive patterns in outbound requests - Policy exception requests and approvals

A proxy audit log captures this automatically. Without a proxy, you are relying on network logs that do not contain request body content.

Layer 4: Incident Response

Define the response playbook before an incident occurs. At minimum:

- How to identify what was sent (which files, which identifiers, over what time period) - Who is notified (legal, affected business units, if regulated data: compliance team) - Containment steps (revoking API keys, rotating credentials that may have been exposed) - Documentation for regulators or customers if notification is required

SOC2 and Regulatory Alignment

SOC2 Type II

Control CC6.7 requires demonstrating that logical access to third-party systems is managed. AI coding tools are third-party systems. Your auditor will ask:

- What AI tools do developers use? - What data do those tools have access to? - What controls prevent unauthorized data disclosure?

A proxy audit log with mutation records is a direct answer to all three questions. An email to the engineering team about acceptable use is not a control.

HIPAA

If your codebase contains references to PHI data structures (patient ID schemas, claim identifiers, EHR record types), sending that code to AI APIs without a Business Associate Agreement creates exposure. Most AI coding tool providers do not offer BAAs on standard plans.

Mutation-based protection addresses this: if the identifiers that reference PHI patterns are mutated before transit, the data that leaves your network does not contain the patterns that trigger HIPAA applicability.

GDPR and EU Data Residency

Enterprise AI API agreements typically allow choosing data residency regions. For EU-based engineering teams, ensure AI API contracts specify EU data processing. This is a contractual control, not a technical one.

Building the Business Case

AI coding tool security has a straightforward ROI calculation.

A proprietary algorithm leak, if it occurs and is attributable, can result in litigation costs and competitive damage far exceeding the cost of prevention. A HIPAA violation for electronic PHI carries statutory penalties starting at $100 per violation per day and reaching $1.9 million for willful neglect. A single SOC2 finding related to uncontrolled AI tool usage can delay enterprise sales cycles by 3-6 months.

Pretense Enterprise pricing is $99/seat/month. For a 50-person engineering team, that is $4,950/month. A single enterprise deal delayed by a SOC2 finding costs more.

The conversation with the CFO is straightforward once the cost of the incident is quantified against the cost of the control.

Practical First Steps

1. Inventory AI tool usage: Survey the engineering team. You likely have more tools in use than you know about. 2. Classify your codebase: Identify which repositories contain proprietary algorithms, regulated data references, or client-confidential code. 3. Deploy proxy protection on high-risk repos first: Start with the code that has the highest risk profile. Expand from there. 4. Build the audit record: The compliance conversation is easier when you can show 90 days of audit logs demonstrating controlled AI tool usage.

[Schedule a demo for your security team](/demo) or [read how other enterprise teams have deployed Pretense](/use-cases).

The CISO Guide to AI Coding Tool Security in 2026