Resource

AI-Generated Code Security Checklist

5 min read20-point checklist

AI coding assistants now draft a large share of new software, and they are remarkably good at producing code that compiles and looks correct. The problem is that "looks correct" and "is secure" are different properties, and a growing body of peer-reviewed research shows AI-generated code introduces vulnerabilities at a meaningful rate while making developers more confident it is safe. That gap between perceived and actual security is the core risk to manage.

What the research actually shows

The foundational study is NYU's 2021 "Asleep at the Keyboard," which prompted GitHub Copilot across 89 scenarios drawn from MITRE's most dangerous software weaknesses and found that roughly 40% of the generated programs contained a vulnerability. This was not a one-off tied to an early model. Veracode's 2025 GenAI Code Security Report evaluated output from more than 100 large language models and found about 45% of AI-generated samples contained a security flaw, including OWASP Top 10 issues — and newer, larger models did not meaningfully improve on that number.

The reason is structural. These models are trained to predict plausible code from enormous public corpora, and public code is full of insecure patterns. The model optimizes for what looks like typical, working code, not for what is safe under adversarial conditions. Without explicit security guidance in the prompt, the default output reflects the average of its training data — and the average of public code is not secure.

The vulnerability classes that keep appearing

The flaws that surface in AI-generated code are not exotic; they are the long-standing classics. Injection is the dominant theme: SQL injection from string-concatenated queries, command injection, and cross-site scripting from unescaped output. Veracode found especially high failure rates for cross-site scripting and log injection. Other recurring issues include hardcoded secrets, weak or outdated cryptography, missing authentication and access-control checks, path traversal, and insecure deserialization.

The practical implication is that assistants tend to fail in predictable ways — which is good news for defenders. The same OWASP Top 10 categories that drive traditional review map almost directly onto the most common AI-generated defects, so existing static analysis, secret scanners, and dependency checks remain effective. The danger is volume and velocity: assistants generate far more code, faster, so the absolute number of defects entering a codebase rises even if the per-line rate holds steady.

Package hallucination and slopsquatting

A distinctly AI-specific risk is package hallucination. When asked to write code, LLMs frequently invent plausible-sounding but non-existent dependencies. The 2024 study "We Have a Package for You!" tested 16 models and generated over 205,000 unique hallucinated package names; roughly 20% of recommended packages did not exist. This creates an attack surface called slopsquatting: an adversary registers a commonly hallucinated package name with malicious code inside, then waits for developers — or their AI agents — to install it on the model's recommendation.

This is not theoretical. Researchers demonstrated the pattern by registering a frequently hallucinated name and observing thousands of downloads within months. Because hallucinated names are often consistent across prompts and models, attackers can predict and pre-register them at scale. Verify that every suggested dependency actually exists, comes from the expected maintainer, and is pinned to a known-good version before it enters a lockfile.

The human-review gap and over-trust

The most consequential finding may be behavioral. Stanford's 2022-2023 study, "Do Users Write More Insecure Code with AI Assistants?," found that participants with access to an AI assistant wrote significantly less secure code than those without — and were also more likely to believe their code was secure. The developers who trusted the assistant most and scrutinized its output least produced the most vulnerable code.

This over-trust effect undermines the usual safety net. Reviewers apply less scrutiny to code that is fluent, well-formatted, and confidently presented, and AI output is all three. Automation bias leads engineers to assume the machine "knows more," especially in unfamiliar technologies — precisely where they are least equipped to catch a subtle flaw. The output reads like a senior engineer wrote it, but carries none of that engineer's accountability or threat awareness.

Integrating review into the workflow

Treat AI-generated code as untrusted input that happens to be useful, applying the same validation, review, and testing you would demand of any external contribution. Bake automated security gates directly into the path AI code travels: static analysis (Semgrep, CodeQL, Bandit), secret scanners, and software composition analysis with dependency verification should run in CI on every change so the most common injection, secrets, and supply-chain defects are caught before merge.

Equally important is preserving meaningful human review. Treat assistant output as a draft from an enthusiastic but unaccountable junior developer: require a knowledgeable reviewer to understand and approve it rather than rubber-stamping fluent code. Feed security context into prompts and assistant configuration so models are steered toward validation, parameterized queries, and least-privilege patterns. None of these controls guarantees secure code, but layered together they shrink the gap between code that looks right and code that is defensible.

Key takeaway

Treat AI-generated code as untrusted input, not a trusted colleague: verify every dependency, gate it through automated scanning, and keep a critical human in the loop — the assistant's confidence is not evidence of its safety.

Checklist

The AI-Generated Code Security checklist

A practical, copy-ready list to run against your own codebase, pipeline, and AI usage.

Input handling and injection

  • Validate and sanitize every external input before using it in a query, command, or file path.
  • Use parameterized queries or prepared statements; never build SQL by string concatenation.
  • Avoid passing user input to a shell; if unavoidable, use argument arrays and command allowlists, not shell strings.
  • Encode or escape untrusted data before rendering it to HTML to prevent XSS.
  • Constrain file paths to an allowed directory and reject path traversal (../).

Authentication and authorization

  • Confirm every new endpoint enforces authentication, not just the happy path.
  • Check object-level authorization on each resource, not only route-level access.
  • Make sure the assistant did not stub out auth checks with TODOs or permissive defaults.
  • Verify session and token handling matches the rest of the application's conventions.

Secrets and configuration

  • Reject hardcoded API keys, passwords, and tokens — assistants often invent placeholder secrets.
  • Confirm secrets are read from environment variables or a secret store, not committed defaults.
  • Check that generated config does not disable TLS verification or enable debug mode in production.

Dependencies and external APIs

  • Verify suggested packages actually exist and are not typosquats — models hallucinate package names.
  • Pin and scan any new dependency, and prefer actively maintained libraries.
  • Confirm external API calls handle errors and timeouts and do not leak data into logs.

Cryptography and sensitive data

  • Reject weak or deprecated algorithms (MD5/SHA-1 for passwords, ECB mode, custom crypto).
  • Use vetted libraries for hashing and encryption rather than hand-rolled implementations.
  • Confirm PII and secrets are not logged or sent to third parties unintentionally.

Error handling and output

  • Ensure errors do not leak stack traces, queries, or secrets to end users.
  • Confirm the code fails closed (deny) rather than open (allow) when an error or edge case occurs.

This checklist is general guidance, not a guarantee of security. A repo audit applies these checks to your actual codebase, dependencies, and AI usage and returns prioritized findings.

Want these checks run on your repository?

Book a repo audit to get prioritized findings for your codebase, LLM usage, prompts, agents, RAG, MCP tools, dependencies, secrets, containers, and infrastructure.

Book an Audit