SQL Injection: Anatomy, Detection, and Defense — A Hands-On Security Practitioner's Guide
Chapter 1 of 10 · updated Jul 03, 2026
How SQL Injection Actually Works: From String Concatenation to Server Compromise
How SQL Injection Actually Works: From String Concatenation to Server Compromise
Let me show you the exact moment where a web application dies. Not metaphorically — I mean the precise byte where user input stops being data and becomes executable grammar.
In our DeafNews Secure Development Lab, ShopBox runs two variants: a MySQL backend at 192.0.2.20 and a PostgreSQL variant at 192.0.2.30. Both serve the same three functions — login, product search, order lookup — and both are deliberately vulnerable in the same ways. I'm going to walk through the login form on the MySQL instance because it's the cleanest demonstration of the mechanism, but everything here translates.
The Grammar Breakpoint
Here's the actual PHP from ShopBox's login handler, slightly cleaned for readability:
// VULNERABLE — ShopBox login.php (MySQL variant, 192.0.2.10)
$username = $_POST['user'];
$password = $_POST['pass']; $query = "SELECT * FROM users WHERE username = '" . $username . "' AND password = '" . $password . "'"; $result = mysqli_query($conn, $query);
The application thinks it's building a sentence: "Find the user where username equals [something] and password equals [something else]." But the database server doesn't see intent. It sees a string that becomes a syntax tree.
When I enter samuel and correcthorse, the final query string is:
SELECT * FROM users WHERE username = 'samuel' AND password = 'correcthorse'
The parser (the component that turns text into executable structure) tokenizes this into: SELECT, *, FROM, users, WHERE, username, =, 'samuel', AND, password, =, 'correcthorse'. The single quotes demarcate string literals. Everything inside them is data.
Now watch what happens when I enter this as username:
' OR '1'='1' --
The resulting string:
SELECT * FROM users WHERE username = '' OR '1'='1' -- ' AND password = 'anything'
The parser sees: username = '' (empty string, false), then OR (logical operator), then '1'='1' (string literal equals string literal, true), then -- (MySQL comment start, discarding everything after). The WHERE clause evaluates to TRUE. I get the first row in the users table — typically an admin.
In plain terms: I didn't "hack the database." I spoke to the parser in its own language, and the parser believed my input was part of its grammar.
Here's a crude ASCII diagram of where parsing breaks down. The top shows what the developer intended — a clean separation between grammar (keywords, operators) and data (values in quotes). The bottom shows the actual result when user input contains metacharacters (characters with special meaning to the parser, like single quotes and hyphens):
INTENDED STRUCTURE (syntax tree):
SELECT └── * FROM users WHERE └── username = [DATA: 'samuel'] └── AND └── password = [DATA: 'correcthorse'] ACTUAL STRUCTURE WITH INJECTED INPUT:
SELECT └── * FROM users WHERE └── username = '' ← empty string (data) └── OR ← operator (GRAMMAR!) └── '1'='1' ← true expression (GRAMMAR!) └── -- ' AND password... ← commented out (GRAMMAR!)
The breakpoint is the single quote. Once user input can inject a single quote, it escapes its data context and enters grammar context. Everything after that is parsed as SQL syntax, not user data.
The Five Faces of Injection
Attackers don't always get nice error messages or visible output. OWASP's 2025 Top 10 classifies injection broadly [S4], and SQL injection specifically breaks down by what the attacker can observe. I define each on first use here because these terms reappear throughout the guide.
Error-based is the noisiest and often the first thing you detect defensively. The attacker sends malformed input specifically to trigger database error messages, which leak structural information — table names, column types, sometimes even query fragments. If ShopBox's PHP doesn't suppress mysqli_error() output, a payload like ' AND (SELECT 1 FROM nonexistent_table) returns something like Table 'shopbox.nonexistent_table' doesn't exist — confirming the database name and telling the attacker they're on the right track.
UNION-based requires visible output in the application. The attacker appends UNION SELECT to combine their own query with the original, matching column counts. A classic payload against ShopBox's product search might look like:
' UNION SELECT username, password, NULL, NULL FROM users --
This works only if the original SELECT returns four columns and the application prints all four. The attacker gets credentials in what looks like product listings. [S6]
Boolean-based blind (blind meaning the attacker gets no direct data back) is where things get subtle. The application returns different pages for "no results" versus "results found," but never shows the actual database output. The attacker asks true/false questions and watches the page change. Against ShopBox's order lookup:
' AND SUBSTRING((SELECT password FROM users WHERE username='admin'),1,1)='a' --
If the first character of the admin password is 'a', the query returns rows and the page shows "Order found." If not, "No orders." One bit per request. Slow, but fully automated. [S6][S8]
Time-based blind is the same logic with a different channel. Instead of true/false page differences, the attacker injects database-specific delay functions. In MySQL: ' AND IF(1=1, SLEEP(5), 0) --. The page takes five extra seconds to load when the condition is true. No data returned, no errors shown — just timing. [S6][S8]
Stacked queries (also called batched or multiple queries) let the attacker add entirely new statements after the original, terminated by a semicolon. MySQL's PHP driver historically allowed this with mysqli_multi_query but not standard mysqli_query. PostgreSQL is more permissive. This is how you get from reading data to writing files or executing commands — covered in escalation below.
Database-Specific Grammar: MySQL vs PostgreSQL
The sources note that payloads vary across database servers [S7], and this matters for defense because you can't just pattern-match one syntax. Let me show you the practical difference using ShopBox's two backends.
Both MySQL and PostgreSQL expose database structure through an information schema — a standardized set of views describing tables, columns, and constraints. But the implementation details diverge immediately.
On ShopBox-MySQL (192.0.2.20), an attacker confirming injectability might use:
' UNION SELECT 1, table_name, 3, 4 FROM information_schema.tables WHERE table_schema=database() --
The database() function returns the current database name. information_schema.tables lists all tables the current user can see.
On ShopBox-PostgreSQL (192.0.2.30), the same intent requires different syntax:
' UNION SELECT 1, table_name, 3, 4 FROM information_schema.tables WHERE table_schema=current_schema() --
PostgreSQL uses current_schema() instead of database(). String concatenation differs too — MySQL uses CONCAT(), PostgreSQL uses || or CONCAT(). Comment syntax: MySQL accepts -- (with trailing space) or /* */; PostgreSQL accepts -- but /* */ is standard. These differences are why automated tools like sqlmap (covered in page 4) carry database-specific payload dictionaries.
Why this matters: A WAF rule blocking
database()misses PostgreSQL entirely. Defensive detection must account for both grammars, especially in mixed environments or during migrations.
The Escalation Path: Reading to Running
Data exfiltration is embarrassing. Server compromise is catastrophic. The path between them runs through database-specific features that attackers abuse, and I need to be careful here because source coverage is thin for several of these.
Authentication bypass is the simplest escalation — the ' OR '1'='1 example above. But more targeted bypasses exist. If ShopBox's query uses numeric comparison instead of string matching, or if the application logic checks mysqli_num_rows($result) > 0 without validating which user was returned, the attacker gets arbitrary account access.
Credential theft via UNION we already saw. The attacker doesn't need to bypass login if they can read the password hashes directly.
File operations in MySQL use INTO OUTFILE and INTO DUMPFILE. If the MySQL user has FILE privilege and the server configuration permits it, an attacker can write:
' UNION SELECT 1, "<?php system($_GET['cmd']); ?>", 3, 4 INTO OUTFILE '/var/www/shell.php' --
I need to flag: I don't have verified source details on INTO OUTFILE behavior, version requirements, or default restrictions [Coverage Gap]. Check your target MySQL version's documentation. The path /var/www/shell.php is illustrative — actual web roots vary.
Command execution in Microsoft SQL Server uses xp_cmdshell, an extended stored procedure. Sources mention xp_dirtree in SQL injection contexts [S8] but don't document xp_cmdshell mechanics. I won't invent syntax here. The general pattern: if enabled and the attacker can reach a SQL Server backend, stacked queries allow EXEC xp_cmdshell 'whoami'.
PostgreSQL's equivalent is COPY ... TO PROGRAM, introduced in version 9.3. The syntax pattern is:
COPY (SELECT '') TO PROGRAM 'id'
Again, I lack verified source details on exact requirements, superuser needs, or pg_hba.conf interactions [Coverage Gap]. What I can say: PostgreSQL's protocol startup includes user and database identification [S3], and the server validates against configuration files before accepting connections. This doesn't prevent injection, but it affects what the connected user can abuse.
⚠️ Authorized, defensive use only. The above techniques are demonstrated against isolated lab hosts
192.0.2.20and192.0.2.30with no external connectivity. Document every payload for defensive correlation.
Why Prepared Statements Actually Work
Sources list prepared statements as mitigation [S6], but the why is rarely explained well. It's not magic string escaping. It's grammar separation at the protocol level.
Here's the vulnerable ShopBox login again, then its fixed version:
// VULNERABLE — string concatenation, grammar breakpoint
$query = "SELECT * FROM users WHERE username = '" . $username . "' AND password = '" . $password . "'";
$result = mysqli_query($conn, $query);
// DEFENSIVE — parameterized query, grammar fixed before data arrives
$stmt = mysqli_prepare($conn, "SELECT * FROM users WHERE username = ? AND password = ?");
mysqli_stmt_bind_param($stmt, "ss", $username, $password);
mysqli_stmt_execute($stmt);
$result = mysqli_stmt_get_result($stmt);
The ? is a parameter placeholder — a marker in the grammar tree where data will later attach, but which is never itself parsed as syntax. When mysqli_prepare() runs, the database server parses the query structure once, compiles an execution plan, and returns a statement handle. The ? positions are typed ("ss" means two strings) but grammatically inert.
When mysqli_stmt_bind_param() sends the actual username and password, they travel in the protocol's parameter binding channel — not as text appended to a query string. The database server already knows this is data. Even if $username contains ' OR '1'='1, the server treats it as a string literal value, not as operators and comparisons.
PostgreSQL's PREPARE works similarly at the protocol level, with explicit parameter typing [S1]:
PREPARE user_login (text, text) AS SELECT * FROM users WHERE username = $1 AND password = $2;
The $1 and $2 are positional parameters. PostgreSQL infers types from context if unspecified, or uses explicit declarations [S1]. The prepared statement lasts for the session and can be removed with DEALLOCATE [S1]. The server automatically chooses between generic or custom execution plans via plan_cache_mode = auto (default) when parameters exist [S1].
In plain terms: Prepared statements don't "sanitize" input. They remove the attack surface entirely by separating when grammar is decided (prepare time) from when data arrives (execute time). The parser never sees user input as potentially grammatical.
This is why escaping functions are a weaker defense — they operate at the string level, trying to guess which characters need prefixing with backslashes. The grammar breakpoint still exists; you're just hoping your escaping is complete. Prepared statements eliminate the breakpoint.
The Three Surfaces You'll See Again
Throughout this guide, I return to these ShopBox functions:
| Surface | Typical Query Pattern | Injection Type Usually Found |
|---|---|---|
| Login form | SELECT * FROM users WHERE user='...' AND pass='...' | Boolean bypass, error-based |
| Product search | SELECT * FROM products WHERE name LIKE '%...%' | UNION-based, error-based |
| Order lookup | SELECT * FROM orders WHERE order_id=... AND user_id=... | Blind (boolean/time), stacked |
Each surface has different output behavior — login says yes/no, search shows multiple columns, order lookup might return nothing visible. These output characteristics determine which injection techniques are practical, which in turn determines what your detection rules must watch for.
In the next page, I'll catalog the actual payload patterns that exploit these surfaces: the tampered strings, the comment variations, the encoding tricks that evade naive filters. Then in page 3, we walk through manual exploitation of all three ShopBox surfaces, so you understand what automated tools are doing under the hood before you let sqlmap loose in page 4.
For now, the core lesson: SQL injection is not a string problem. It's a grammar problem. Fix the grammar separation, and you fix the vulnerability.