Pyxis.tech

2026-05-12
Featured Article
Technical
By Pyxis Communication

AWS AI Security Agent: Our Real-World Experience Putting It to the Test

In cybersecurity, ignoring the tools that are redefining the field is not an option. That’s why, at Pyxis, we ran proof-of-concept (POC) tests with AWS AI Security Agent as part of our offensive security activities — and we want to share what we found: how it works, what surprised us, and where its real limits are.

How Does AWS AI Security Agent Work?

AWS AI Security Agent runs penetration tests through four sequential phases. This structure ensures that every finding is validated with real evidence and is reproducible:

Phase	Description
Preflight	Connectivity and access validation. The agent verifies it can reach the target domains before starting any test.
Static Analysis	Analysis of source code, configuration, and architecture (if provided). The agent understands the application before attacking it.
Penetration Testing	Real-time exploitation testing. The agent launches attacks tailored to each application’s context.
Finalizing	Finding validation and report generation. Vulnerabilities are confirmed and documented with reproducible evidence.

One important differentiator: the agent can optionally analyze source code, design documentation, and architecture diagrams before starting real-time tests. This allows it to understand the application before attacking it, significantly expanding coverage depth. Each finding includes a severity rating based on CVSS v3.1, a confidence level based on successful exploitation, and detailed reproduction steps.

How Tests Are Organized: Workspaces and Configuration

The agent’s working model is built around the concept of a workspace. From a workspace, you create and manage individual pentests and assign access permissions to specific team members, making it easy to collaborate and control who can view or run each test.

Before starting a pentest, the agent offers several configuration options that provide precise scope control:

— Exclusion of specific paths within the target domain, avoiding out-of-scope areas.

— Exclusion of external domains the target application interacts with, to avoid unintended impact. — Custom HTTP header configuration to identify requests generated by the agent. In our tests, we used the header pyxis: security-agent to differentiate them from legitimate traffic.

— Support for authenticated testing, allowing evaluation of functionality that requires an active session.

— Integration with source code repositories (GitHub), design documents, or architecture diagrams for white-box testing.

During execution, the agent displays in real time the set of tests it’s running, allowing the team to follow progress and understand which attack vectors are being explored at any given moment.

What We Tested: Our POCs on Web Applications

We ran proof-of-concept tests on two web applications with completely different architectures, which allowed us to evaluate the agent’s ability to adapt across different contexts. In both cases, the agent started from a single entry point and autonomously expanded the attack surface as it discovered new endpoints.

Test Coverage

The agent executed a broad and structured set of tests. The table below summarizes the categories covered:

Category	Tests Executed
Code Injection	SQL Injection, Command Injection, Code Injection, Server-Side Template Injection (SSTI), HTML Injection
Authentication & Tokens	JWT Vulnerabilities (alg:none, kid injection, brute force), horizontal and vertical Privilege Escalation
Access Vulnerabilities	IDOR (Insecure Direct Object Reference), Local File Inclusion, Path Traversal
Client-Side Attacks	Reflected, stored, and DOM-based XSS, CSRF
Web Infrastructure Security	SSRF (Server-Side Request Forgery), Arbitrary File Upload, XXE (XML External Entity)
SSL/TLS Configuration	BREACH, BEAST, HSTS, Fallback SCSV — using testssl.sh

It’s worth noting that in neither POC did we observe the agent performing network-level tasks, such as open port reconnaissance or infrastructure scanning. All tests were focused exclusively on the application layer.

Key Findings by Application

Application 1 — Web Application (Node.js)

The agent discovered and evaluated 38 endpoints, including the main domain and a variety of internal APIs related to blog services, member management, forms, CRM, and analytics.

The most relevant findings from this POC:

JWT Vulnerabilities: the agent identified and exploited the alg:none vulnerability in JWT tokens, forging tokens with elevated privileges (isAdmin: true, role: SITE_OWNER) and testing access to restricted admin endpoints.
Privilege Escalation: using the forged JWT tokens, both horizontal and vertical privilege escalation was attempted, reaching platform administration endpoints.
Stored XSS: advanced payloads were tested in blog post creation fields, comments, tags, and author fields, in both authenticated and unauthenticated contexts.
SSRF: internal URLs (including the AWS metadata endpoint) were injected into form submission parameters and webhook configuration fields.

Application 2 — Next.js Application with RESTful API

In this application, the agent started from a single endpoint and autonomously expanded scope to cover 106 endpoints, including authentication, management, and document routes.

The most significant findings from this POC:

CRITICAL Vulnerability — Hardcoded Credentials: the agent detected and exploited integration credentials exposed in client-side JavaScript bundles. Using these credentials, it authenticated and accessed sensitive endpoints. This finding resulted in a confirmed, real privilege escalation.
Incorrect Error Handling (Medium): 9 endpoints returned HTTP 500 Internal Server Error instead of proper 401/403 responses when accessed with CORS headers but no credentials, exposing information about the internal authentication flow.
TLS/SSL Vulnerabilities: using testssl.sh, the agent identified multiple weaknesses: BREACH (HTTP compression over HTTPS), TLS Fallback SCSV issues, and BEAST due to CBC cipher suites with TLS 1.0.

What Impressed Us Most: Report Quality

The aspect that surprised us most was the final report. Unlike traditional automated tool reports — which typically include generic descriptions copied from vulnerability databases — AWS AI Security Agent generates descriptions tailored to the specific context of each evaluated application.

Each finding includes:

A contextualized description of the vulnerability, explaining its concrete impact on the evaluated application.
Detailed, reproducible steps to replicate the attack, including the exact commands used.
Exploitation evidence with real requests and responses captured during testing.
Severity classification with a CVSS v3.1 score and confidence level based on successful validation.
False positive identification with technical justification: not just flagging findings, but explaining why they were ruled out.

The Value of Human Judgment Over Raw Compute

During our tests with AWS AI Security Agent, we experienced a moment that precisely illustrates the boundary between computational power and human judgment.

The agent performed a flawless sweep of the web application. Within minutes, it identified the application’s structure, analyzed access policies, scanned for known vulnerability patterns, and exploited them to confirm its findings. Yet it missed a critical finding: a directory in the application that exposed a file containing sensitive information.

Two perspectives on the same finding

— For the agent: it was a valid, accessible route based on configured permissions.

— For our team: it was a risk affecting confidentiality that could be leveraged to gain access and damage the client’s reputation.

Without enough context about the document’s contents, the agent interpreted it as valid and legitimate information. Our team, on the other hand, understands the context, the business, and the client.

AI provides the power. We provide the judgment and the strategy.

Expanding Our Team’s Reach with AI

Integrating AWS AI Security Agent changes how we scale our capabilities as a team. In the context of applications with development lifecycles (SDLC) that need to be continuously reviewed before going to production, AI-powered security testing enables us to:

Reduce execution time without compromising quality or development velocity.
Get preliminary results in hours, not days.
Focus specialists on techniques that require human context and reasoning: business logic manipulation, vulnerability chaining, indirect flow attacks, and more.

Security testing agents are already a reality. But that doesn’t mean our team stops exploiting vulnerabilities. It means we can focus our full analytical capacity on designing more sophisticated attack strategies, exploiting more complex vulnerabilities, and building attack chains that require human context. This synergy allows for broader coverage and a significantly deeper level of analysis on every engagement.

How Far Does AWS AI Security Agent Reach?

AWS AI Security Agent operates at the application layer. The attack surfaces it covers include:

Web applications and web services / APIs.
Applications running on multicloud and on-premises environments.
AWS cloud services and configurations.
Applications in private environments within a VPC.
Authenticated testing.
White-box testing, when access to source code repositories (GitHub), design documents, or architecture diagrams is provided.

What It Doesn’t Cover Yet

Like any tool, AWS AI Security Agent has concrete limitations that are important to understand in order to use it as a complement to your team’s work:

Mobile applications: does not support testing on Android or iOS apps.
Non-standard ports: we encountered difficulties evaluating applications not exposed on standard HTTP ports (80 and 443).
IP ranges (CIDR): does not support running tests directly against network ranges.
Non-web protocols: does not cover services such as SSH, FTP, SMB, RDP, or other protocols outside the HTTP/HTTPS scope.
Infrastructure layer: does not perform tests on physical devices or network infrastructure (port reconnaissance, CIDR scanning).
Fixed test catalog: although broad, the technique catalog is predefined. Highly customized techniques or those requiring specific business logic remain in the human team’s domain.

Now in GA: What Does That Mean?

AWS AI Security Agent has reached general availability (GA), graduating from its preview phase. A service moving to GA in AWS signals that it has passed internal stability, security, and scalability validations, and that AWS considers it production-ready at scale. In the context of an AI-powered automated pentesting tool, it also signals that the market is adopting it at a pace that justifies the move to general availability.

On pricing: AWS has published a pay-per-use cost structure based on pentest execution time and resources consumed, similar to other managed AWS services. This makes it an accessible option for teams that need to scale their security testing without the fixed costs of traditional pentesting tools.

For teams evaluating integrating it into their SDLC, the time to act is now: GA stability removes the uncertainty inherent to a preview and opens the door to more robust integrations within CI/CD pipelines.

Conclusion

Adopting new AI tools isn’t about using all of them — it’s about knowing which ones genuinely elevate the team’s work. Our experience with AWS AI Security Agent — and with other tools we’ve evaluated — confirms that pentesting has entered a new stage of maturity.

At Pyxis, AI-powered pentesting isn’t a promise or a lab experiment: it’s something we’re already applying to strengthen our clients’ applications. We understand that security in the cloud demands speed — but it also demands judgment, instinct, and context.

If your team is accelerating cloud deployments and you need security to keep pace rather than slow you down, we can help.