Skip to content
Confir.
EU AI Act

EU AI Act Article 51: How GPAI Models Get Classified as Systemic Risk

Annex Guide23 May 2026· 16 min read· 3,237 words

Article 51 classifies GPAI models as systemic risk: 10²⁵ FLOP presumption or Commission decision. In force 2 August 2025. Fines up to €15M or 3%.

A general-purpose AI model is not automatically a model with systemic risk. Article 51 of Regulation (EU) 2024/1689 sets out exactly when that classification applies — the criteria, the legal presumption, and the Commission's discretionary power to extend it. Getting this wrong matters: a provider that misclassifies its model as below systemic-risk threshold and skips the Article 55 obligations faces fines of up to €15 million or 3% of total worldwide annual turnover under Article 101.

GPAI rules have been in force since 2 August 2025. This is not a future deadline — it is live now.

This guide covers what Article 51 actually does, how the classification criteria work, what Annex XIII says, how the 10²⁵ FLOP presumption operates in practice, and what it means for the small number of companies that are genuinely GPAI model providers. If you build or distribute software that uses a third-party foundation model — which describes most companies in the EU AI Act's scope — Article 51 almost certainly does not apply to you directly. That context matters and is addressed below.


What Article 51 Does (and Does Not Do)

Article 51 is a classification article. It defines the conditions under which a GPAI model is designated as a model with systemic risk. It does not impose obligations on its own: the actual obligations for systemic-risk GPAI providers sit in Article 55 (model evaluations, adversarial testing, incident reporting, cybersecurity measures, and energy-efficiency reporting). Article 52 sets out the notification procedure once the classification is made.

This distinction matters for reading the Chapter V framework clearly:

  • Article 53 — obligations for all GPAI model providers (technical documentation, information for downstream providers, a copyright compliance policy, a publicly available summary of training data).
  • Article 51 — the gateway: which models additionally carry systemic risk.
  • Article 55 — the extra obligations that attach once the Article 51 classification is made.

A provider subject only to Article 53 has a lighter load. Once Article 51 applies, Article 55 activates on top of it.


The Two Routes to Systemic-Risk Classification

Article 51(1) sets out two independent grounds for classification.

Route 1: High-Impact Capabilities

A GPAI model is classified as having systemic risk if it has high-impact capabilities, evaluated on the basis of appropriate technical tools, methodologies, indicators, and benchmarks. The Article does not enumerate those tools — that task falls to the Commission and to the codes of practice developed under Article 56 — but it anchors classification to capability, not to market size or deployment scale alone.

The key technical presumption lives here: a GPAI model is presumed to have high-impact capabilities when the cumulative amount of compute used for its training, measured in floating-point operations (FLOP), is greater than 10²⁵. The presumption is rebuttable. A provider whose model crosses that threshold can argue to the Commission that the model does not in fact exhibit high-impact capabilities — but the burden is on the provider, not the regulator.

Why compute as a proxy? Training compute correlates with emergent capabilities that are difficult to assess directly before deployment. The 10²⁵ FLOP figure broadly captures the frontier of models that have demonstrated qualitatively new behaviours — reasoning, planning, multi-step instruction-following at scale — while excluding the vast majority of models in commercial use. To give a sense of the order of magnitude: GPT-3 (2020) used approximately 3.1 × 10²³ FLOP. Models trained at 10²⁵ FLOP are, as of 2025, produced by a handful of organisations globally.

Route 2: Commission Decision

The second route is discretionary. The Commission may decide, either on its own initiative or following a qualified alert from the scientific panel (the independent scientific body established under Article 68), that a GPAI model has capabilities equivalent to those described in Route 1, or has impact that is equivalent even if the training compute figure is formally below threshold.

The Commission's decision under this route is based on the criteria in Annex XIII. This is the mechanism that future-proofs the classification: if a new architectural approach produces high-impact capabilities at lower compute, the Commission can capture it without waiting for a legislative amendment.


Annex XIII: The Classification Criteria

Annex XIII lists the factors the Commission uses when assessing whether a GPAI model has — or is reasonably expected to have — high-impact capabilities, particularly for the Route 2 discretionary decision. The criteria include:

  • Number of parameters of the model.
  • Quality and size of the training dataset, including whether it spans multiple modalities (text, image, code, audio, video).
  • Compute used for training (the same metric as the Route 1 presumption, but here as one factor among several rather than a binary threshold).
  • Input and output modalities — a multimodal model that reasons across text, vision, and audio presents different risk profiles than a single-modality model.
  • Benchmarks and evaluations on general and specific capability tasks, particularly those measuring reasoning, planning, autonomy, or capability to assist in dangerous activities.
  • Reach of the model, including the number of registered business users and end users in the EU. A model used by millions of EU residents across critical sectors has a different risk profile than one deployed in a controlled industrial setting.
  • Model architecture and the degree to which the model can be fine-tuned or adapted for downstream uses.
  • Systemic effects already observed, such as dependence of critical services on a single model.

The list is not exhaustive. Annex XIII reflects the state of technical understanding at the time of adoption; the Commission may adopt delegated acts under Article 51(3) to amend the thresholds, indicators, and benchmarks in light of technological developments. This is the legislative mechanism to adjust the 10²⁵ FLOP figure upward or downward without reopening the full Regulation.


How the 10²⁵ FLOP Presumption Works in Practice

A worked example illustrates how the threshold operates.

A US-based frontier model provider trains a large multimodal model. Internal training logs show cumulative compute of approximately 8 × 10²⁵ FLOP — well above threshold. The model is offered via API to business customers globally, including several thousand registered EU business users.

Under Article 51(1)(a), the model is presumed to have high-impact capabilities. The provider has two options:

  1. Accept the classification and proceed to notify the Commission under Article 52, then fulfil the Article 55 obligations: conduct model evaluations and adversarial testing before release and at least annually thereafter; identify and mitigate systemic risks; report serious incidents and malfunctions to the AI Office; implement cybersecurity measures; and report on energy consumption.

  2. Challenge the presumption by providing the Commission with evidence that the model, despite exceeding the compute threshold, does not exhibit high-impact capabilities. In practice, given where the 10²⁵ level sits in the current capability distribution, a successful rebuttal would require demonstrating narrow or constrained capabilities inconsistent with the systemic risk definition — an uphill task for a large multimodal model.

The provider cannot simply ignore the threshold. Article 52(1) requires GPAI model providers to notify the Commission when their model meets the Article 51 criteria. The notification triggers Commission review, not self-certification.

Now consider a second scenario: a European AI research institute trains a model using only 4 × 10²⁴ FLOP — below the presumption threshold. The model, however, shows unusually strong performance across a range of capability evaluations, including autonomous planning tasks. The scientific panel reviews public benchmarks and issues a qualified alert to the Commission under Article 90. The Commission opens a discretionary assessment under Article 51(1)(b), applying the Annex XIII criteria. If the assessment concludes that the model has capabilities equivalent to a model at or above the threshold, it issues a decision classifying the model as having systemic risk — and Article 55 obligations attach from that point.


Who This Actually Applies To

Article 51 targets GPAI model providers — the organisations that train and place general-purpose AI models on the market or into service under their own name or trademark, as defined in Article 3(3). In the current market structure, this is a small number of organisations: the frontier labs training large language models, large multimodal models, and equivalent systems at scale.

If you are a company that:

  • integrates a third-party API (OpenAI, Anthropic, Google, Mistral, etc.) into your product,
  • fine-tunes an existing open-weights model for a specific use case,
  • builds an application or workflow on top of a foundation model,

then you are almost certainly a downstream provider or deployer, not a GPAI model provider under Article 53, and Article 51's classification machinery does not apply to you. Your obligations as a downstream provider are governed by Articles 16 and 26 (and, if your system is high-risk, by Articles 9 through 15). You are not classifying the underlying GPAI model — that is the upstream provider's responsibility.

The critical compliance question for most companies is whether they have properly documented the GPAI model(s) they rely on, their relationship to those models, and whether those upstream models have notified the Commission of systemic-risk status. That relationship — which GPAI model underpins which registered system — is a documented fact worth maintaining in your AI inventory.


The Chapter V Timeline: Already in Force

GPAI rules (Chapter V, Articles 51–56) applied from 2 August 2025. This is distinct from the high-risk timeline:

  • 2 August 2025 — GPAI obligations live, including the Article 51 classification criteria, the Article 52 notification procedure, Article 53 general GPAI provider obligations, and Article 55 systemic-risk obligations.
  • 2 December 2027 — Stand-alone high-risk AI systems (Annex III list: recruitment, credit, biometrics, etc.) under the Digital Omnibus agreed May 2026.
  • 2 August 2028 — High-risk AI as safety components in regulated products (Annex I).

Frontier model providers who crossed the 10²⁵ FLOP threshold before the 2 August 2025 date should already have notified the Commission. Providers who cross it after that date must notify under Article 52 promptly.


Article 51 in the Broader GPAI Framework

It helps to see how Article 51 connects to the surrounding articles:

Article 53 (all GPAI providers) requires, regardless of systemic risk: a technical documentation package (per Annex XI and XII), information to downstream providers about capabilities and limitations, a policy to comply with EU copyright law, and a publicly accessible summary of training data. These obligations apply to every GPAI model provider — not just those above the 10²⁵ FLOP threshold.

Article 51 then asks the next question: does this model additionally have high-impact capabilities? If yes, the classification of systemic risk attaches.

Article 52 is the procedural consequence: the provider must notify the Commission, which may then conduct its own assessment and issue a formal classification decision.

Article 55 is where the substantive extra burden lands: model evaluations (including adversarial testing), risk identification and mitigation under Annex XIII, serious incident reporting to the AI Office, cybersecurity measures adequate to the model's risk profile, and annual energy-efficiency reporting.

Article 56 establishes codes of practice through which providers, in cooperation with the AI Office, develop the detailed methodologies, metrics, and benchmarks that flesh out how Articles 51 and 55 are applied in practice.

The systemic-risk classification under Article 51 is therefore not an end-point — it is the gateway that activates a richer and more demanding compliance track.


Penalties for GPAI Providers

GPAI-specific enforcement runs through Article 101, not Article 99. The Commission — acting through the AI Office — may impose fines on GPAI model providers of up to €15 million or 3% of total worldwide annual turnover in the preceding financial year, whichever is higher. This applies to failures to comply with the GPAI obligations, including Article 51's notification obligations and Article 55's substantive requirements.

Article 99 penalties (the general enforcement track used by national authorities) also apply where relevant — for instance, if a GPAI model is embedded in a high-risk system and the provider has breached requirements under Articles 16 through 27 in that capacity.


Open-Weights Models and the Systemic-Risk Question

The Article 51 framework raises a specific question for open-weights models — those whose weights are made publicly available and can be downloaded, run, and fine-tuned without restriction. If the model was trained using more than 10²⁵ FLOP, the presumption of high-impact capabilities applies regardless of how the model is distributed.

The practical question is who counts as the provider. Under Article 3(3), a provider is the entity that develops and places a GPAI model on the market or puts it into service under its own name or trademark. An organisation that releases open weights under its own name — even without charging for access — is placing the model on the market in the sense of making it available to users. The open-weights release does not dissolve the provider's obligations under Chapter V.

What changes is the downstream picture. Once an organisation downloads and deploys open weights in their own product, Article 25 may apply: if they substantially modify the model or deploy it in a way that changes its intended purpose, they may become a provider in their own right for that use case. But the original training organisation retains its Article 53 obligations, and if the training compute exceeds the threshold, its Article 55 obligations too.

The codes of practice developed under Article 56 are expected to address the specific implementation challenges for open-weights releases — for instance, how to conduct post-release adversarial testing on a model that anyone can download and run, or how to report incidents when the user base is unbounded. The Commission's AI Office has indicated this is an active area in the drafting process.

The Role of Codes of Practice in Defining the Benchmarks

Article 56 requires the AI Office to facilitate the development of codes of practice for Chapter V obligations. These codes are not optional guidance — they are the mechanism through which the technical detail of Articles 51 and 55 gets operationalised. A GPAI model provider that follows a completed, Commission-approved code of practice is presumed to comply with the corresponding obligations.

For Article 51 specifically, the codes are expected to develop:

  • Evaluation frameworks — the specific benchmarks and testing protocols used to assess whether a model has high-impact capabilities beyond the FLOP proxy. This matters because the FLOP threshold is a presumption, not a definition: the Commission needs agreed methodologies to assess models that challenge the presumption or that fall below threshold but exhibit unexpected capabilities.
  • Capability descriptors — definitions of what "high-impact" means operationally across different modalities and use contexts. A model capable of autonomous multi-step planning presents different risks from one that generates fluent text without reasoning capabilities.
  • Documentation standards — standardised formats for the information that providers must submit under Article 52 and maintain under Article 55, making the Commission's review process more tractable.

Providers should monitor the codes of practice process through the AI Office's published working documents. The first codes were drafted with input from frontier model providers and civil society during 2025, with the Commission reviewing them for adequacy. Following a completed code does not exempt a provider from its legal obligations, but it provides a clear safe-harbour path.

What Happens After Notification: the Article 52 Procedure

Once a provider concludes that Article 51 applies — either because the model crosses the FLOP threshold or because the Commission has issued a decision — it must notify the Commission under Article 52. The notification triggers a formal process.

The Commission reviews the notification and may request additional information. It may also initiate its own assessment if it believes a model should be classified as having systemic risk even absent a notification. After review, the Commission issues a decision confirming or denying the classification. A confirmed classification activates Article 55 in full.

Providers who disagree with a Commission classification decision can challenge it through the standard EU administrative and judicial review process. The scientific panel established under Article 68 can issue qualified alerts that trigger the Commission's assessment but does not itself make binding classification decisions — that authority sits with the Commission.

The notification requirement is not a registration of the model in the EU AI database in the sense of Article 49 (which governs high-risk AI system registration for downstream deployers). It is a distinct regulatory act addressed to the Commission directly, through the AI Office.

Distinguishing Article 51 Classification from Article 49 Registration

A persistent source of confusion — including in the original version of this article — is conflating Article 51 (GPAI systemic-risk classification) with Article 49 (EU database registration for high-risk AI systems). They are entirely separate obligations:

  • Article 49 requires providers and deployers of high-risk AI systems (Annex III use cases: recruitment, credit, biometrics, etc.) to register those systems in the EU public database before placing them on the market or putting them into service. This applies to the specific downstream system, not to the underlying GPAI model.
  • Article 51 requires the Commission to classify whether a GPAI model itself meets the systemic-risk threshold. The provider notifies under Article 52; the Commission assesses and decides.

A system that uses a GPAI model in an Annex III application may trigger both: the downstream provider registers the high-risk AI system under Article 49, and the upstream GPAI model provider may separately have notified the Commission under Article 52. These are obligations on different entities at different points in the supply chain.

What Confir Records

Confir does not assess whether an upstream GPAI model meets the Article 51 systemic-risk threshold — that is a determination the GPAI model provider makes (or the Commission makes on its initiative). What Confir tracks is the GPAI relationship of AI systems registered in your inventory: which foundation model underpins a given system, whether the upstream provider has systemic-risk status, and how that relationship affects the compliance posture of your registered systems.

This is practically useful for downstream providers and deployers who need to document the provenance of the AI capabilities in their stack — part of the broader governance picture that the Article 11 / Annex IV technical documentation requires.


A Note on Scope for Companies Below the Threshold

The 10²⁵ FLOP presumption is set high deliberately. The vast majority of AI systems in commercial use — including most fine-tuned models, RAG pipelines, classification systems, and domain-specific models — sit many orders of magnitude below it. For those systems, Article 51 is informative rather than directly actionable: understanding how the classification works helps you understand the compliance status of the foundation models you depend on, and helps you answer "what is the GPAI relationship of this system?" when it comes up in your technical documentation.

The article to pay closer attention to, for most companies, is Article 6 (high-risk classification) and Annex III (the use-case list). If your AI system touches recruitment, credit scoring, biometric identification, or the other eight Annex III categories, that is where your compliance burden concentrates — under a deadline of 2 December 2027 for stand-alone systems.


Related guides

Manage your EU AI Act compliance in one place

Confir automates risk classification, technical documentation, and audit trails for any company. No consultants. No 6-month projects. 7-day free trial.

Start free trial →