The past year has seen a steep rise in generative AI systems that claim to be open. But how open are they really? New research shows there’s widespread practice of “open-washing” by companies like Meta and Google: claiming brownie points for openness while evading actual scrutiny.
The question of what counts as open source in generative AI takes on particular importance in light of the EU AI Act that regulates “open source” models differently, creating an urgent need for practical openness assessment.
Almost all the major tech corporations claim to provide “open” models, but very few actually do. Andreas Liesenfeld and Mark Dingemanse of Radboud University’s Center for Language Studies surveyed 45 text and text-to-image models that bill themselves as open. It provides a stark view of the purported openness of current generative AI.
Their study was published recently at the ACM Conference on Fairness, Accountability and Transparency (ACM FAccT 2024), and profiled in a News brief in Nature.
Avoiding scrutiny
The researchers found that corporations like Meta, Microsoft and Mistral strategically co-opt terms like “open” and “open source” while in fact shielding their models almost entirely from scientific and regulatory scrutiny. There is frequent use of terms like “open” and “open source” for marketing purposes without actually providing meaningful insight into source code, training data, fine-tuning data or architecture of systems.
Building on their earlier work, the researchers put over 45 models to the test, this time also considering text-to-image generators. They find that openness is unevenly distributed and often overclaimed. Instead, they found that smaller players like AllenAI (with OLMo) and BigScience Workshop + HuggingFace (with BloomZ) often go the extra mile to document their systems and open them up to scrutiny.
EU AI Act
The recently introduced EU AI Act provides special exemptions for “open source” models, but doesn’t offer a clear definition of the term. This creates an incentive for open-washing: if models count as open, model providers face less onerous requirements and less public and scientific scrutiny. Liesenfeld states, “This makes it more important that we have clarity about what constitutes openness when it comes to generative AI. We don’t see openness as an all-or-nothing phenomenon, but as composite (consistent of multiple elements) and gradient (it comes in degrees).”
Though the EU AI Act creates more urgency, openness has long been recognized to be of key importance for innovation, science, and society. It can also build trust and understanding in AI by demystifying what it’s capable of. Dingemanse says, “If a company like OpenAI claims their AI can ‘pass the bar exam,’ this may or may not be impressive depending on what is in the training data.
“OpenAI has been notoriously vague about this, probably also to avoid legal exposure, but the sheer magnitude of training data means that ChatGPT and similar next word prediction engines can do most exams in ‘open book’ mode, making their performance much less impressive.”
The work helps build a case for meaningful openness in AI and brings to light a growing number of alternatives to ChatGPT. It comes a short while after Radboud University’s Faculty of Arts has released guidance on generative AI and research integrity, which calls for more critical AI literacy among researchers considering the use of generative AI.
More information:
Andreas Liesenfeld et al, Rethinking open source generative AI: open washing and the EU AI Act, The 2024 ACM Conference on Fairness, Accountability, and Transparency (2024). DOI: 10.1145/3630106.3659005
Elizabeth Gibney, Not all ‘open source’ AI models are actually open: here’s a ranking, Nature (2024). DOI: 10.1038/d41586-024-02012-5
Radboud University
Citation:
‘Open-washing’ generative AI: How Meta, Google and others feign openness (2024, July 3)
retrieved 3 July 2024
from https://techxplore.com/news/2024-07-generative-ai-meta-google-feign.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.