What Is Article 10?

Article 10 of the EU AI Act sets out the data and data governance requirements for providers of high-risk AI systems. It applies before market placement — meaning your data quality documentation must exist before your AI system goes live, not after. The obligation covers three types of data:

  • Training data — the data used to develop and train the model
  • Validation data — the data used to tune parameters and evaluate the model during development
  • Test data — the data used to evaluate final performance before deployment

If any of these data sets are inadequate, biased, or poorly documented, that is not just a technical problem — it is a compliance failure that regulators can act on from 2 August 2026.

Who Must Comply with Article 10

Article 10 applies to providers of high-risk AI systems — the organisations that develop and place them on the EU market. You are a provider if you:

  • Built the AI model and sell it as a product or service in the EU
  • Integrated a pre-built model into your own product and placed it under your name or brand
  • Substantially modified a third-party AI system for a high-risk use case

Deployers (organisations using high-risk AI without developing it) have a narrower obligation under Article 26: monitor that inputs to the AI system remain relevant and representative of the intended use context. If inputs drift significantly from the training distribution, that must be flagged and reported to the provider.

Fine-tuning makes you a provider. If you take a foundation model (GPT-4, Claude, Llama, etc.) and fine-tune it on your own data for a high-risk use case — HR screening, credit scoring, medical decision support — you become a provider for that system. Article 10 applies to your fine-tuning data, not just the base model's training data.

The Four Core Data Requirements

1. Relevance and Representativeness (Article 10(3))

Training, validation, and test data must be:

  • Relevant to the AI system's intended purpose — the data must reflect the actual task the model is being asked to perform
  • Sufficiently representative of the operational environment — the data must reflect the population, context, and conditions under which the system will be used in practice
  • As free from errors as possible — this does not require perfection, but it does require documented quality controls
  • Complete in relation to the characteristics of the intended purpose — relevant variables, edge cases, and demographic groups must be present in appropriate proportions

The regulation does not mandate specific dataset sizes or statistical thresholds. What it requires is documented evidence that these criteria were considered and addressed for your specific use case.

2. Data Governance Practices (Article 10(2))

Beyond data quality, Article 10(2) requires documentation of how data was handled throughout the development process. Your data governance record must cover:

  • The origin and source of each dataset
  • How data was collected — surveys, clinical studies, web scraping, operational logs, third-party providers
  • Data preparation steps: labelling methodology, cleaning procedures, annotation guidelines, inter-annotator agreement rates where applicable
  • Assumptions made during data formulation — what the data is assumed to represent and the limits of that assumption
  • Assessment of data availability, quantity, and suitability relative to the task
  • The train/validation/test split and the rationale for it

This is often the most overlooked part of Article 10. A regulator reviewing your technical file will not just ask "what data did you use?" — they will ask "how do you know this data is appropriate for this purpose?" The governance record is your answer.

3. Bias Examination (Article 10(5))

Article 10(5) is the provision that catches most organisations off guard. It requires that data be examined for possible biases that could lead to risks to health and safety or fundamental rights — and that appropriate data management measures are taken.

A bias examination must:

  1. Identify which protected characteristics are relevant to your specific use case. A credit scoring model has different bias risks than an HR screening tool or a clinical decision support system. The relevant characteristics might include age, gender, ethnicity, disability status, nationality, or other attributes depending on what the AI decides.
  2. Apply appropriate bias detection methods to training, validation, and test data. Common approaches include:
    • Distributional analysis — does the dataset proportionally represent the population the AI will affect?
    • Subgroup performance testing — does model accuracy, false positive rate, or false negative rate differ significantly across demographic groups?
    • Counterfactual fairness testing — does changing a protected attribute while keeping everything else constant change the model's output?
  3. Document findings — including any biases identified, their magnitude, and potential impact on the affected population.
  4. Record what was done in response — data augmentation, rebalancing, algorithmic fairness constraints, or an explanation of why the identified bias is acceptable given the use case and residual risk.

Bias examination is not bias elimination. The EU AI Act does not require zero bias — it requires documented analysis and proportionate response. An organisation that identifies bias, documents it, takes reasonable mitigation steps, and records the residual risk is in a far stronger compliance position than one that asserts "our model is not biased" with no evidence.

4. Special Categories of Personal Data (Article 10(5))

Providers and deployers may, in exceptional circumstances, process special categories of personal data (health data, ethnic origin, biometric data, criminal records, etc.) specifically to detect and correct bias. This is a narrow exemption with strict conditions:

  • Processing must be explicitly limited to bias detection and correction purposes
  • Appropriate technical and organisational safeguards must be implemented
  • Data must be deleted once the bias correction objective is achieved
  • The legal basis under GDPR must separately permit the processing

Note: the EU Digital Omnibus proposes extending this provision to all AI systems and models (not just high-risk), but it is not yet law. The current Article 10(5) exemption applies to high-risk AI systems only.

How Article 10 Connects to Your Other Obligations

Article How Article 10 Feeds Into It
Article 9 — Risk Management System Bias findings from Article 10 become identified risks in the Article 9 risk register. Mitigation measures and residual bias risk must be documented in the risk management record. Article 9(7) requires testing across relevant demographic groups — which should align with the Article 10 bias examination.
Article 11 — Technical Documentation Annex IV of the EU AI Act specifies what the Article 11 technical file must contain. It explicitly requires: a description of training datasets and data governance measures, information on data provenance, and the results of bias examination. Your Article 10 record is a required component of the technical file.
Article 13 — Transparency Instructions for use must inform deployers of known limitations — including data coverage limitations or known bias characteristics that could affect performance in specific contexts or demographic groups.
GDPR GDPR governs the lawfulness of personal data processing in training. Article 10 of the AI Act adds AI-specific quality requirements on top — not instead of — GDPR obligations. You need both a GDPR legal basis and Article 10 quality documentation for any personal data in training.

What Your Article 10 Documentation Should Include

Your Article 10 compliance record — which forms part of the Article 11 technical file — should be structured around four sections:

Data Description

  • Dataset name, source, and collection date range
  • Volume, format, and language(s)
  • Intended use and why this data is appropriate for the intended purpose
  • How data was obtained (consent, public domain, operational logs, purchased, etc.)
  • Geographic coverage and jurisdiction-specific considerations

Data Preparation Record

  • Cleaning procedures applied and why
  • Labelling methodology and who did the labelling (human annotators, automated, hybrid)
  • Inter-annotator agreement metrics where applicable
  • How missing, corrupted, or outlier data was handled
  • Train/validation/test split percentages and rationale
  • Any data augmentation applied

Representativeness Assessment

  • The population on which the AI will be used
  • How the dataset represents that population
  • Known gaps, underrepresented groups, or out-of-distribution scenarios
  • Steps taken to address representativeness gaps (additional data collection, synthetic data, oversampling)
  • Remaining limitations and how they are disclosed to deployers

Bias Examination Record

  • Protected characteristics examined and why they are relevant to this use case
  • Bias detection methods used
  • Findings: biases identified, their statistical magnitude, and the affected groups
  • Mitigation measures applied (data rebalancing, algorithmic constraints, etc.)
  • Residual bias characterisation — what bias remains after mitigation and why it is acceptable or how it is managed
  • Link to Article 9 risk register entries for bias-related risks

Five Common Article 10 Mistakes

1. Treating data documentation as a one-time exercise. Article 10 applies to the data used at each model version. If you retrain with new data, update the documentation. Regulators examining a system that has had multiple training iterations will expect version-controlled data records.

2. Describing data without examining it. A dataset inventory is not Article 10 compliance. The regulation requires documented evidence of examination — what you looked for, what you found, what you did about it. Asserting "data is representative" without evidence is an Article 10 gap.

3. Assuming GDPR compliance covers Article 10. GDPR determines whether you can process personal data at all. Article 10 sets quality and governance standards for that data. A dataset can be GDPR-compliant and still fail Article 10 if it is insufficiently representative or unexamined for bias.

4. Fine-tuning without new Article 10 documentation. When you fine-tune a foundation model on proprietary data, that fine-tuning data is subject to Article 10. The base model provider's GPAI transparency documentation covers their pre-training — your fine-tuning data is your responsibility.

5. Disconnecting Article 10 from Article 9. The most common audit failure is having a bias examination that identified risks but a risk register with no corresponding entries. Article 10 findings must flow into Article 9. Separate, unlinked documents create compliance gaps that are immediately visible to a trained reviewer.

Aurora Trust generates Article 10 data governance documentation as part of the compliance document pack — including the data description, representativeness assessment, and bias examination record, structured to feed directly into the Article 11 technical file. Starting at €49/month. See how it works →