How to audit an AI-generated PHP codebase
A five-step methodology for reviewing PHP code an AI wrote — trust boundaries, sinks, auth checks, session and redirect handling, and tooling. With the grep commands.
You inherited (or wrote, fast) a PHP codebase that came out of an AI assistant. It works. You'd like to know what's in it before real users do. This is the methodology — five steps, in order, with the grep commands and the triage notes. Allow about half a day for a small project, a day or two for a real one.
The order matters. Step 1 produces the inventory the rest of the audit operates on. Step 5 catches what the previous four missed but isn't a replacement for them — automated tools see patterns, not architecture.
- Map the trust boundaries (where does user input enter the system?)
- Trace inputs to sinks (where does it land — SQL, output, files, redirects?)
- Audit auth on every mutating endpoint (CSRF, session, permission, rate limit)
- Review session, cookie, and redirect handling (the edge-case cluster)
- Run the scanner (belt and suspenders)
Step 1 — Map the trust boundaries
A trust boundary is anywhere data crosses from "outside your system" to "inside your code."
Every $_GET, $_POST, $_COOKIE, $_FILES,
$_SERVER read. Every file_get_contents('php://input'). Every
remote API call where the response gets used. Every database row that an end user can
eventually edit.
Find them all first. Everything else in the audit hangs off this inventory.
The grep:
cd your-project # Direct superglobal access — should mostly be inside a framework helper, # never sprawled across application code. grep -rnE '\$_(GET|POST|COOKIE|REQUEST|FILES|SERVER)' --include='*.php' . \ | grep -v vendor/ | grep -v node_modules/ # Raw stdin (JSON / XML API endpoints). grep -rn 'php://input' --include='*.php' . | grep -v vendor/ # Anything matching the framework's input helper. UserSpice example: grep -rn 'Input::get\|Input::exists' --include='*.php' . | grep -v users/
Write down every file that came back. That's your "user input touches here" list. For every entry, note the variable name and where the data ends up two lines later.
What AI assistants get wrong at this layer: they tend to assume
$_POST['x'] is "clean" by the time it's been assigned to a local variable.
It isn't. Tainted is tainted until something actively sanitizes it.
Step 2 — Trace inputs to sinks
For each trust boundary, follow the variable until it gets used. The sinks that matter:
- SQL — any
$db->query(),$pdo->query(),mysqli_query(),$stmt->execute(). Concatenation is the bug; placeholders are the fix. - HTML output — any
<?= $var ?>,echo $var,print($var)in a template. Unescaped is the bug;htmlspecialchars()/safeReturn()is the fix. - JavaScript embedding — anything injected into a
<script>block. Even properly-escaped HTML isn't enough; usejson_encodewith the HEX flags orsafeJsonEncodeForJs(). - Shell / process invocations —
exec(),system(),shell_exec(),passthru(),proc_open(). Any of these touching user input is a finding. - File paths —
fopen(),file_get_contents(),require,include,unlink()with user-controlled paths. Path traversal lives here. - Redirects —
header('Location: ' . $...). Open redirects + CRLF injection. - Deserialization —
unserialize()on cookie or POST data is RCE. - Outbound HTTP —
curl_*orfile_get_contentson a URL the user controls is SSRF.
The grep:
grep -rnE 'query\s*\(|exec\s*\(|system\s*\(|shell_exec\s*\(' --include='*.php' .
grep -rnE 'header\s*\(\s*[\"\x27]Location' --include='*.php' .
grep -rnE 'unserialize\s*\(|eval\s*\(' --include='*.php' .
grep -rnE 'file_get_contents\s*\(\s*\$' --include='*.php' .
grep -rnE '<\?=\s*\$' --include='*.php' . # unescaped echo
Each match goes into a "review" pile. For each one, answer the question: could user input reach this line? If yes, that's a finding. If no (because the only path to the variable is a constant or a controlled enum), note it and move on.
What AI assistants get wrong here: happy to include $page_param
because "the caller should validate," with no caller actually validating. Happy to embed
user IDs in a redirect because "they're already logged in," forgetting that the user can
type anything into the URL.
Step 3 — Audit auth on every mutating endpoint
"Mutating endpoint" = any URL that inserts, updates, or deletes data, or that sends an email, or that triggers an external API call. List them, then for each one verify all four of these:
-
The user is logged in. Some kind of session check at the top of the file.
UserSpice calls this
securePage(); vanilla PHP needssession_start()followed by anisset($_SESSION['user_id'])guard. - The user is allowed. Permission check matching what the parent page enforced — your roles, your groups, your authz logic. The most common failure mode is "the page checks, the AJAX parser doesn't, the AJAX parser is what mutates."
-
The CSRF token is verified. See the CSRF page; the short version is "every POST has
hash_equals()orToken::check()on the token before any DB write." - Rate limiting if it's an auth-adjacent endpoint. Login, register, password reset, magic link, 2FA verify. Tracked by IP and by identifier (email/username).
The grep — find the mutating endpoints:
# Files that contain a write call grep -rlE 'INSERT|UPDATE|DELETE|->insert\(|->update\(|->delete\(' \ --include='*.php' . | grep -v vendor/ | grep -v users/ # AJAX endpoints (always need re-checking — they're a separate request) find . -type d -name parsers -not -path '*/vendor/*'
For each file, open it and confirm the four checks. The audit pattern is mechanical: if any of the four are missing, write down the file path and which check is missing. You're building a punch list.
What AI assistants get wrong here: the AJAX-parser case, every time.
They'll write a perfectly secure form page that posts to parsers/save.php, and
then write parsers/save.php as if it inherits the calling page's auth. It
doesn't. The parser is its own HTTP request and has to re-do every check.
Step 4 — Review session, cookie, and redirect handling
The cluster of "looks fine in isolation, has a real bug when you think about it" patterns. Five things to check:
-
Session cookie flags.
Secure,HttpOnly, andSameSiteall set. The bootstrap should callsession_set_cookie_params()beforesession_start(), or the framework should be doing it for you. Verify by visiting the site and inspecting the cookie in DevTools. -
Session fixation. After a successful login, the session ID should rotate
(
session_regenerate_id(true)). Otherwise an attacker who set the session ID beforehand keeps it after the user logs in. -
"Remember me" tokens. Stored in a cookie that's database-backed,
rotated on use, invalidated on logout, generated with
random_bytes(32). Notmd5(user_id . secret), which is the AI default. -
Redirects from user input.
?next=parameters and friends. Either whitelist the targets or use a sanitizer that enforces same-origin (UserSpice:Redirect::sanitized()). -
Logout. Calls
session_destroy()and clears the session cookie (setcookie(session_name(), '', time() - 3600, '/')). Otherwise the session ID lingers and "logout" doesn't fully log out.
The grep:
grep -rn 'session_start\|session_set_cookie_params\|session_regenerate_id' --include='*.php' .
grep -rnE 'setcookie\s*\(' --include='*.php' .
grep -rnE 'header\s*\(\s*[\"\x27]Location' --include='*.php' .
What AI assistants get wrong here: all five, in roughly equal proportions. Session handling is one of those topics that has a "right answer per framework," and AI's default falls back to "vanilla PHP from 2010" — which means no flags, no rotation, no proper logout, and remember-me tokens that look secure but aren't.
Step 5 — Run the scanner
The first four steps are a human audit — they catch architecture-level issues, missing patterns, and reasoning-about-trust mistakes that no tool can find. Once you've done them, run a scanner to catch the patterns you missed.
You want, at minimum:
- Static analysis with PHP-aware rules — Semgrep with a PHP rule pack, or Psalm at security-focused config. Catches SQL concatenation, unescaped output, weak crypto, dangerous sinks.
-
Dependency CVE scan — Trivy or
composer auditagainst your lock file. Catches "your dependency has a known RCE." - Secrets scan — Gitleaks across the working tree and git history. Catches API keys that got committed.
- Runtime scan — ZAP against a staging copy. Catches missing security headers, cookies without flags, runtime XSS, and the absence-of-protection cases static tools struggle with.
On UserSpice, the Security Scanner bundles all four (plus PHPStan and a headers check) into a single bash entry point with UserSpice-aware rule packs. Off UserSpice, the same four tools are available individually and free.
What scanners are good at: the mechanical patterns. What they're bad at: the absence of code (no rate limit, no audit log, no permission check on an admin route), and anything that requires reasoning about your data model. That's why they're step 5, not step 1.
Triage: what to do with the findings
You'll come out of steps 1–5 with a list. Bucket every item into one of four severities, in this order:
-
Critical — anything that lets an unauthenticated attacker take over an
account, dump the database, or execute arbitrary code. SQL injection on a login form.
RCE via
unserialize. Hardcoded admin password. Fix today, before the next deploy. If already deployed, take the endpoint down or block it at the WAF until you can fix. - High — anything that lets an authenticated user escalate privileges, read data they shouldn't, or that breaks under predictable attacker input. Mass assignment. Authz check on the page but not the parser. Fix this week.
- Medium — XSS in user-facing fields, missing rate limits, weak tokens that haven't been broken yet. Fix this sprint.
- Low — missing security headers, verbose error messages in production, outdated dependencies with no known active exploit. Roll into the next maintenance pass.
Track them somewhere — a spreadsheet, an issue tracker, a markdown file in the repo. The goal is to leave the audit with a punch list, not a pile of half-remembered concerns. The half-remembered version is how things ship.
How long does this take?
Rough estimates for a real codebase:
- Small project (~10 files, single feature) — half a day, end-to-end.
- Medium (~50 files, several features, AJAX endpoints) — 1–2 days. Most of that is steps 2–3.
- Large (200+ files, plugins, admin section, real auth model) — 3–5 days for a first pass, plus ongoing work to address the findings.
The first project you audit takes longer than the second. By the fifth, you're flying — the patterns repeat across codebases more than you'd expect.
Want this done for you?
Auditing your own code is hard for the same reason proofreading your own writing is hard — you don't see what you expected to see. If you'd rather have an outside set of eyes do the five steps and send back a prioritized findings report, paste the repo URL below.