The European Union recently introduced the AI Act, a new governance framework compelling organisations to enhance transparency regarding their AI systems’ training data.
Should this legislation come into force, it could penetrate the defences that many in Silicon Valley have built against such detailed scrutiny of AI development and deployment processes.
Since the public release of OpenAI’s ChatGPT, backed by Microsoft 18 months ago, there has been significant growth in interest and investment in generative AI technologies. These applications, capable of writing text, creating images, and producing audio content at record speeds, have attracted considerable attention. However, the rise in AI activity accompanying these changes prompts an intriguing question: How do AI developers actually source the data needed to train their models? Is it through the use of unauthorised copyrighted material?
Implementing the AI Act
The EU’s AI Act, intended to be implemented gradually over the next two years, aims to address these issues. New laws take time to embed, and a gradual rollout allows regulators the necessary time to adapt to the new laws and for businesses to adjust to their new obligations. However, the implementation of some rules remains in doubt.
One of the more contentious sections of the Act stipulates that organisations deploying general-purpose AI models, such as ChatGPT, must provide “detailed summaries” of the content used to train them. The newly established AI Office has announced plans to release a template for organisations to follow in early 2025, following consultation with stakeholders.
AI companies have expressed strong resistance to revealing their training data, describing this information as trade secrets that would provide competitors with an unfair advantage if made public. The level of detail required in these transparency reports will have significant implications for both smaller AI startups and major tech companies like Google and Meta, which have positioned AI technology at the center of their future operations.
Over the past year, several top technology companies—Google, OpenAI, and Stability AI—have faced lawsuits from creators who claim their content was used without permission to train AI models. Under growing scrutiny, however, some tech companies have, in the past two years, pierced their own corporate veil and negotiated content-licensing deals with individual media outlets and websites. Some creators and lawmakers remain concerned that these measures are not sufficient.
European lawmakers’ divide
In Europe, differences among lawmakers are stark. Dragos Tudorache, who led the drafting of the AI Act in the European Parliament, argues that AI companies should be required to open-source their datasets. Tudorache emphasises the importance of transparency so that creators can determine whether their work has been used to train AI algorithms.
Conversely, under the leadership of President Emmanuel Macron, the French government has privately opposed introducing rules that could hinder the competitiveness of European AI startups. French Finance Minister Bruno Le Maire has emphasised the need for Europe to be a world leader in AI, not merely a consumer of American and Chinese products.
The AI Act acknowledges the need to balance the protection of trade secrets with the facilitation of rights for parties with legitimate interests, including copyright holders. However, striking this balance remains a significant challenge.
Different industries vary on this matter. Matthieu Riouf, CEO of the AI-powered image-editing firm Photoroom, compares the situation to culinary practices, claiming there’s a secret part of the recipe that the best chefs wouldn’t share. He represents just one instance on the laundry list of possible scenarios where this type of crime could be rampant. However, Thomas Wolf, co-founder of one of the world’s top AI startups, Hugging Face, argues that while there will always be an appetite for transparency, it doesn’t mean that the entire industry will adopt a transparency-first approach.
A series of recent controversies have driven home just how complicated this all is. OpenAI demonstrated the latest version of ChatGPT in a public session, where the company was roundly criticised for using a synthetic voice that sounded nearly identical to that of actress Scarlett Johansson. These examples point to the potential for AI technologies to violate personal and proprietary rights.
Throughout the development of these regulations, there has been heated debate about their potential effects on future innovation and competitiveness in the AI world. In particular, the French government has urged that innovation, not regulation, should be the starting point, given the dangers of regulating aspects that have not been fully comprehended.
The way the EU regulates AI transparency could have significant impacts on tech companies, digital creators, and the overall digital landscape. Policymakers thus face the challenge of fostering innovation in the dynamic AI industry while simultaneously guiding it towards safe, ethical decisions and preventing IP infringement.
In sum, if adopted, the EU AI Act would be a significant step toward greater transparency in AI development. However, the practical implementation of these regulations and their industry results could be far off. Moving forward, especially at the dawn of this new regulatory paradigm, the balance between innovation, ethical AI development, and the protection of intellectual property will remain a central and contested issue for stakeholders of all stripes to grapple with.
See also: Apple is reportedly getting free ChatGPT access
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.