
Why Leading AI Systems Struggle with EU AI Act Compliance
Challenges in Regulatory Compliance and the Role of Retrieval-Augmented Models
The rapid evolution of artificial intelligence (AI) has given rise to powerful large language models (LLMs) such as ChatGPT, DeepSeek, and Mistral, which excel in generating human-like text across diverse domains. These general-purpose AI systems, while transformative, face significant obstacles in complying with the European Union’s AI Act, a landmark regulation aimed at ensuring trustworthy AI through stringent requirements for high-risk systems. These requirements include transparency, robust data governance, risk management, and accountability. This article examines the key compliance challenges for leading AI systems and how retrieval-augmented generation (RAG) models, which integrate curated data retrieval with generative capabilities, offer a solution to address these regulatory hurdles.
The EU AI Act: A Framework for Trustworthy AI
The EU AI Act, proposed in 2021 and finalized in 2024, establishes a risk-based framework for regulating AI systems. It categorizes AI applications into four risk levels—unacceptable, high, limited, and minimal—with high-risk systems subject to the most rigorous obligations. These include:
- Transparency: Clear documentation of data sources and system capabilities.
- Data Governance: Compliance with data protection laws, such as the General Data Protection Regulation (GDPR). Tools for GOAD Knowledge Clusters ensure data is cleaned and chunked properly, enhancing governance and auditability.
- Risk Management: Identification and mitigation of systemic risks, including bias and misinformation. Frequent Updates of GOAD Knowledge Clusters helps keep data current and reduces risks related to outdated or biased information.
- Accountability: Mechanisms to ensure human oversight and traceability of outputs.
General-purpose AI systems, due to their broad applicability and scale, are often classified as high-risk under the AI Act, particularly when deployed in sensitive domains like healthcare, finance or government European Commission.
Compliance Challenges for Leading AI Systems
General-purpose AI systems like ChatGPT, DeepSeek, and Mistral are trained on massive, heterogeneous datasets scraped from the internet, enabling their versatility but also creating significant compliance challenges. Below are the primary reasons these systems fall short of the AI Act’s requirements:
Opaque Data Sources and Transparency Issues
The training datasets for models like ChatGPT and DeepSeek often comprise billions of web pages, social media posts, and other unstructured sources. This makes it nearly impossible to provide detailed documentation of data origins, as required by the AI Act. The inability to trace data sources also complicates compliance with copyright and data protection laws, raising significant governance challenges Artificial Intelligence and Law.
Dynamic Use Cases and Risk Assessment Challenges
Unlike task-specific AI systems, general-purpose models are deployed across diverse contexts, from creative writing to legal analysis, making it difficult to anticipate and mitigate all potential risks. The AI Act requires high-risk systems to undergo comprehensive risk assessments, but the open-ended nature of LLMs complicates this process. A 2024 analysis emphasizes how unpredictable deployment scenarios pose a challenge for risk-based compliance Digital Society.
Ethical Concerns and Bias Propagation
The reliance on uncurated internet data introduces biases and ethical risks that undermine the AI Act’s fairness requirements. These issues challenge compliance with the AI Act’s mandates for fairness and non-discrimination Cambridge Forum on AI: Law and Governance.
Cybersecurity and Data Privacy Vulnerabilities
Large-scale AI systems are prime targets for cyberattacks, and recent incidents highlight their vulnerabilities. For example, security flaws in dataset management can create non-compliance with GDPR and national security standards Communications of the ACM.
The Advantages of Retrieval-Augmented Generation (RAG) Models
Retrieval-augmented generation (RAG) models, which combine LLMs with curated data retrieval systems, offer a compelling alternative to general-purpose AI. By retrieving relevant, high-quality information from structured databases before generating responses, RAG models address many of the compliance and performance challenges faced by traditional LLMs. Below are their key advantages:
Improved Transparency and Data Governance
RAG models rely on curated, verifiable datasets, enabling providers to document data sources and ensure compliance with GDPR and the AI Act’s transparency requirements European Data Protection Supervisor.
Minimized Bias and Hallucinations
By grounding responses in retrieved documents, RAG models significantly reduce hallucinations—incorrect or fabricated outputs that plague general-purpose LLMs. This aligns with the AI Act’s requirements for accountability and trustworthiness arXiv.
Alignment with Regulatory Requirements
The structured nature of RAG models makes them well-suited for high-risk applications under the AI Act, such as medical diagnostics or legal decision-making. By limiting data to curated sources, RAG models simplify risk assessments and mitigation strategies arXiv.
Conclusion
Leading AI systems like ChatGPT, DeepSeek, and Mistral face formidable challenges in complying with the EU AI Act due to their reliance on opaque, uncurated datasets, unpredictable use cases, and cybersecurity vulnerabilities. These limitations highlight the difficulties of aligning general-purpose AI with stringent regulatory standards. In contrast, retrieval-augmented generation (RAG) models offer a viable path forward by leveraging curated, high-quality data to enhance transparency, reduce bias, and align with the AI Act’s requirements for trustworthy AI. As regulatory frameworks evolve, the industry may increasingly adopt hybrid approaches that combine the versatility of general-purpose AI with the precision and compliance of RAG models. By prioritizing structured data and robust governance, RAG systems pave the way for a new generation of AI that balances innovation with accountability.