Back to Blog
·10 min read
SecurityDeveloperEducationCopilot

AI Security 101: What Every Developer Needs to Know Before Using Copilot or Claude

Most developers do not think about what they are sending to AI tools. This is a practical primer on what leaves your machine, where it goes, what is actually at risk, and three rules every developer should follow before using AI on production code.

What Happens When You Send Code to an AI

When you paste code into Claude, Copilot, or Cursor, here is what actually happens:

1. Your code leaves your machine as an HTTPS request to an external API 2. The API processes it on infrastructure you do not control 3. The provider stores the interaction per their data retention policy 4. The response is returned and the session ends

In most cases this is completely fine. For a lot of code it is even desirable — LLM providers need data to improve their models, and if you are working on a hobby project, that tradeoff is reasonable.

The problem is when the code you are pasting contains things that should not leave your network.

What Is Actually at Risk

API Keys and Credentials

The most obvious risk. If your code has a hardcoded API key — even in a comment, even in a .env file you opened in the same context window — that key can appear in a prompt.

python
# Bad: This entire file context could be sent
STRIPE_SECRET_KEY = "sk_live_4xHj..."  # production key

def charge_customer(amount: float) -> dict: # developer asks: "why isn't this working?" ... ```

AI providers' systems are secure, but you are still transmitting a live production key to a third party. The key appears in request logs. It could be retained for model training. It creates audit trail issues for SOC 2 and similar frameworks.

Proprietary Business Logic

Less obvious but often more damaging. Your pricing algorithm, fraud detection logic, recommendation engine, or risk scoring model represents years of investment. Sending it to an external AI means it leaves your network.

typescript
// This function represents real business IP
function calculateRiskScore(
  user: User,
  transaction: Transaction,
  historicalPatterns: Pattern[]
): number {
  // proprietary algorithm
  const behaviorScore = analyzeUserBehavior(user, historicalPatterns);
  const velocityRisk = checkTransactionVelocity(transaction);
  const networkRisk = assessNetworkGraph(user);
  return weightedRisk(behaviorScore, velocityRisk, networkRisk);
}

The function name, the parameter structure, the sub-function names — these are all identifiers that describe your system architecture to anyone who sees the prompt.

PII in Test Data

Developers often work with real production data in test environments "just to check something." Real names, email addresses, SSNs, and account numbers end up in AI prompts regularly. This is a HIPAA and GDPR liability, not just a security concern.

Three Rules Before Using AI on Production Code

Rule 1: Never paste code with active credentials

Use placeholder values or environment variable references when asking AI for help with code that accesses secrets.

python
# Instead of showing this:

# Show this: api_key = os.environ["STRIPE_API_KEY"] # or "STRIPE_KEY_HERE" ```

Rule 2: Sanitize proprietary identifiers in sensitive contexts

Before sending code that contains business-critical function or class names, consider whether the names themselves reveal your architecture. In high-stakes cases, rename to generics before pasting.

python
# Original:

# Sanitized for AI question: result_score = calculate_score(transaction) ```

Rule 3: Use a local proxy for ongoing work

For teams doing significant AI-assisted development, a proxy like Pretense handles sanitization automatically. You keep your workflow, the proxy handles mutation of sensitive identifiers before they leave your network.

bash
# One-time setup
npm install -g pretense
pretense init

# Set in your shell profile export ANTHROPIC_BASE_URL=http://localhost:9339 export OPENAI_BASE_URL=http://localhost:9339/openai ```

After that, your normal workflow routes through the proxy without any behavior change.

What Providers Actually Do With Your Data

**Anthropic (Claude)**: API usage is not used for training by default. You can opt into training data contribution. Data is retained per their privacy policy. SOC 2 Type II certified.

**OpenAI (GPT/Copilot)**: API usage data is not used for training by default. Copilot Individual uses prompts for product improvement; Copilot Business and Enterprise do not. SOC 2 Type II certified.

**Google (Gemini)**: API usage data retention varies by product and plan. Check the specific product's data processing terms.

The word "default" matters here. In all cases, the safe path is to not send data you cannot afford to share, regardless of provider policy.

The Practical Test

Before sending any code block to an AI, ask: "Would I be comfortable if a smart competitor could read this?"

If the answer is no, either sanitize it manually using the rules above, or use a proxy that handles it automatically.

[Try Pretense — 30-second setup, no configuration required](/early-access)

Share this article

Ask me anything