Skip to content
Confir.
Blog

AI Red-Teaming and the EU AI Act: What It Is and Where It Fits

Guide23 May 2026· 11 min read· 2,202 words

Red-teaming supports Art 9 risk management, Art 15 robustness, and Art 55 adversarial testing for systemic-risk GPAI providers. Not a standalone mandate.

Red-teaming is the practice of systematically attempting to find failures in an AI system by simulating adversarial behaviour — probing for unexpected outputs, bypassing safety controls, eliciting harmful content, or exploiting brittleness at the edges of training data. A red team for an AI system does what a penetration tester does for a network: assume the adversary's mindset, look for what went wrong that the builder did not anticipate, and document the results so they can be fixed.

The EU AI Act of 2024 (Regulation (EU) 2024/1689) does not use the term "red-teaming." It is a practice, not a statutory requirement — and that distinction matters for compliance framing. What the Act does require is a risk management system (Article 9), accuracy and robustness testing (Article 15), and — for providers of GPAI models with systemic risk — model evaluation against adversarial conditions (Article 55). Red-teaming is a technique that serves all three. Whether it is mandatory depends on what the system is and who is building it.


Where Red-Teaming Connects to the Act

Article 9: Risk Management System

Article 9 requires high-risk AI providers to establish, implement, document, and maintain a risk management system throughout the AI system's lifecycle. The RMS must identify and analyse known and reasonably foreseeable risks to health, safety, and fundamental rights; estimate and evaluate those risks; and adopt suitable risk-mitigation measures.

"Known and reasonably foreseeable risks" is the key phrase. A risk management system that only identifies risks the development team thought of without adversarial pressure is likely incomplete. Bias in edge cases, prompt injection in language model components, distribution shift failures, and systemic output manipulation are categories of risk that internal testing — which tends to confirm expected behaviour — systematically misses. Red-teaming is one of the most effective tools for surfacing this class of risk before deployment.

The Article 9 requirement is not satisfied by a static risk list assembled at design time. The Act requires the RMS to be updated based on new information arising from post-market monitoring (Article 72) and from the conformity assessment process. A red-team exercise conducted before launch feeds directly into the RMS's residual risk assessment and into the Article 11 technical documentation (Annex IV).

Article 15: Accuracy, Robustness, and Cybersecurity

Article 15 requires high-risk AI systems to achieve an appropriate level of accuracy and to be resilient against errors, faults, inconsistencies, and — critically — against adversarial attacks that could alter outputs or exploit the system. Providers must declare performance metrics and their limits.

Red-teaming directly evidences compliance with the adversarial robustness element of Article 15. A provider that can show documented red-team results — the attack types attempted, the conditions under which the system failed, the mitigations implemented, and the residual risk assessment — is in a stronger position than one relying on benchmark performance alone. Benchmarks measure average-case behaviour; red-teaming probes the failure modes that matter most for safety.

The cybersecurity element of Article 15 overlaps with the Article 55 adversarial robustness requirement for systemic-risk GPAI models (see below). For high-risk AI systems that are not GPAI-based, the Article 15 cybersecurity requirement does not mandate red-teaming by name, but it does require evidence that the system has been tested against realistic adversarial inputs relevant to its deployment context.

Article 55: Systemic-Risk GPAI Providers

Article 55 imposes specific obligations on providers of GPAI models classified as systemic-risk under Article 51 — those trained above the 10²⁵ FLOP threshold (or designated by the AI Office). Among those obligations: conduct model evaluations, including adversarial testing, to identify and mitigate systemic risks; assess and mitigate possible systemic risks at the Union level; and maintain cybersecurity protections for the model.

For systemic-risk GPAI providers, adversarial testing is as close to a mandatory red-teaming requirement as the Act gets. The Codes of Practice being developed under Article 56 will give operational content to "adversarial testing" for GPAI models — but even before they are finalised, Article 55 obliges systemic-risk GPAI providers to conduct structured evaluation against adversarial conditions. These obligations have applied since 2 August 2025.

The Article 55 requirement is expressly for the model provider — the entity that trains and places the GPAI model on the market. It does not extend as an Article 55 obligation to organisations that build applications on top of GPAI models. Those organisations may still conduct red-teaming as a practice under their own Article 9 RMS or Article 15 robustness obligations, but they are not subject to Article 55.


What Red-Teaming Covers in Practice

A well-scoped red-team exercise for an AI system typically covers several distinct failure mode categories.

Safety and harm elicitation. Attempting to get the system to produce outputs it is designed not to produce — harmful instructions, discriminatory content, misleading medical or legal advice. For customer-facing systems, this is frequently the highest-priority category.

Prompt injection and instruction following. For systems that accept unstructured text input (language model components, chatbots, document-processing tools), prompt injection attempts to override system instructions through user-supplied text. An agent that can be manipulated into ignoring its safety instructions through a crafted document it reads is a significant risk in professional deployments.

Bias and distributional failures. Probing performance on population subgroups, edge-case inputs, and inputs that differ from the training distribution. For high-risk AI systems in employment, credit, or law enforcement contexts, differential performance across protected groups is a legal exposure under both the AI Act (Annex III classification conditions) and EU anti-discrimination law.

Adversarial robustness. Testing whether the system's outputs can be manipulated by small, deliberate perturbations to input data — relevant primarily for computer vision and audio processing systems, but increasingly relevant for language model components as well.

Data leakage and privacy attacks. Probing whether the system inadvertently reveals training data, user data from other sessions, or proprietary information in its outputs. This overlaps with GDPR obligations for systems processing personal data.

Behavioural drift under distribution shift. Testing system behaviour on inputs that are plausible in deployment but underrepresented in the evaluation set. A credit-scoring model evaluated on a historical dataset may behave unexpectedly on populations whose financial profiles differ from the training population.


Who Should Conduct Red-Teaming

The credibility of a red-team exercise depends partly on who conducts it. Internal teams benefit from domain knowledge but are subject to confirmation bias and may unconsciously avoid scenarios that challenge design assumptions. External red teams — specialist security firms, academic researchers, domain experts unfamiliar with the system's internals — surface failure modes that internal teams miss precisely because they do not share the same mental model of what the system is supposed to do.

For high-risk systems subject to Article 43 conformity assessment, the robustness and accuracy documentation (Article 11 / Annex IV, section covering Article 15) will be examined by the market surveillance authority or, for biometric systems under Annex III point 1, by a notified body. An external red-team report provides independent evidence of adversarial testing that a self-generated internal assessment cannot.

For systemic-risk GPAI providers subject to Article 55, the AI Office may request access to model evaluation results. The GPAI Code of Practice being developed under Article 56 is expected to specify adversarial testing methodologies and documentation standards.

A practical structure for most organisations:

  • Internal red-team exercises on a rolling cycle for operational awareness and iteration feedback.
  • External red-team engagement before major releases, new use-case deployments, or at the conformity assessment stage.
  • Structured bug bounty or researcher disclosure programme for deployed customer-facing systems, providing continuous external coverage at lower cost than periodic engagements.

Documenting Red-Team Results for Compliance Purposes

A red-team exercise that is not documented does not exist for compliance purposes. The Article 11 / Annex IV technical documentation requires providers to describe the measures taken to identify and mitigate foreseeable risks — which includes adversarial testing results and the mitigations they prompted.

Documentation should capture:

  • Scope and methodology. What categories of failure were tested, what inputs were used, and how the exercise was structured. Enough detail for an auditor to assess the exercise's coverage.
  • Findings. The failure modes identified, ranked by severity and likelihood. Include the specific inputs or conditions that triggered each finding — sanitised where necessary to prevent misuse, but specific enough to demonstrate real testing.
  • Mitigations. What was done in response to each finding: model update, guardrail addition, output filter, operational constraint, or a documented residual risk acceptance with justification.
  • Residual risks. The risks that persist after mitigation, assessed against the Article 9 RMS framework. The provider must evaluate whether residual risks are acceptable given the intended purpose and the benefits of the system.
  • Iteration record. If the exercise prompted a model update or architecture change, the documentation should link the finding to the change and to any re-testing conducted.

A provider that can produce this documentation at a conformity assessment or market surveillance inspection is demonstrating a functional Article 9 RMS, not just a checklist one.


Red-Teaming Is Not a One-Time Gate

The most common misunderstanding about red-teaming in AI compliance contexts is treating it as a pre-deployment gate — something you do once, document, and close. That framing does not match the Act's requirements.

Article 9 requires the risk management system to be updated in light of new information throughout the AI system's lifecycle. Article 72 requires providers to actively collect post-market performance data and use it to update risk assessments. Article 73 requires reporting when incidents arise that meet the Article 3(49) serious-incident threshold.

A red-team exercise conducted once at launch and never repeated will miss failure modes that emerge as: the model's use expands to new populations or use cases; adversarial actors develop new attack patterns; the underlying GPAI model is updated by its provider; or the deployment context changes in ways that alter the risk profile.

For high-risk AI systems, the appropriate cadence is at minimum: a scoped exercise before each major update or new use-case deployment; a broader exercise annually; and triggered exercises when post-market monitoring or incident data reveals an unexpected failure pattern.


Frequently Asked Questions

Is red-teaming mandatory under the EU AI Act?

Not by name for most systems. Red-teaming is a practice, not a statutory term. However, the Article 9 risk management requirement for high-risk AI providers effectively requires adversarial testing as part of a credible RMS — the risk identification mandate cannot be satisfied by testing that only confirms expected behaviour. For systemic-risk GPAI providers, Article 55 explicitly requires adversarial testing as part of model evaluation. The distinction: high-risk AI providers have an implied obligation to test adversarially; systemic-risk GPAI providers have an explicit one.

What is the difference between Article 9 and Article 15 for red-teaming purposes?

Article 9 requires the risk management system to identify foreseeable risks — red-teaming is a tool for that identification. Article 15 requires the system to achieve robustness against adversarial attacks and to declare its performance limits — red-team results directly evidence that requirement. Both articles support the case for red-teaming; Article 15 is where the adversarial robustness finding lands in the compliance documentation.

Does Article 55 make red-teaming mandatory for all AI companies?

No. Article 55 applies only to providers of GPAI models classified as systemic-risk under Article 51 — those trained above the 10²⁵ FLOP threshold or designated by the AI Office. It does not apply to organisations that use GPAI models to build applications, nor to providers of AI systems that are not GPAI models. Those organisations may conduct red-teaming under their Article 9 and Article 15 obligations, but they are not subject to Article 55.

When did Article 55 obligations begin applying?

Chapter V GPAI obligations, including Article 55, have applied since 2 August 2025. GPAI model providers whose models were already on the market before that date have until 2 August 2027 to achieve compliance.

How should red-team results be documented for Article 11 purposes?

The Annex IV technical documentation must describe measures taken to identify and mitigate foreseeable risks. A red-team report should record the scope and methodology, the specific findings with severity assessment, the mitigations implemented in response, any residual risks accepted, and the iteration record linking findings to changes. This documentation is the evidence base for the Article 9 RMS and the Article 15 accuracy and robustness declaration.

What are the penalties for inadequate risk management under Article 9?

Non-compliance with Article 9 falls under Article 99(4): up to €15,000,000 or 3% of total worldwide annual turnover, whichever is higher. For SMEs and start-ups, Article 99(6) caps the fine at the lower of the percentage or the fixed amount. The high-risk obligations — including Article 9 — apply to stand-alone Annex III systems from 2 December 2027 (under the Digital Omnibus agreed May 2026).


Related guides

Manage your EU AI Act compliance in one place

Confir automates risk classification, technical documentation, and audit trails for any company. No consultants. No 6-month projects. 7-day free trial.

Start free trial →