• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

New dataset and models boost Portuguese language AI performance to match English

Simon Osuji by Simon Osuji
July 23, 2025
in Artificial Intelligence
0
New dataset and models boost Portuguese language AI performance to match English
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Portuguese
Credit: Unsplash/CC0 Public Domain

Large language models, such as ChatGPT, perform significantly less well in Portuguese than in English despite both languages being spoken worldwide. This gap has now been closed with “GigaVerbo.” The team led by Dr. Nicholas Kluge Corrêa from the Center for Science and Thought at the University of Bonn is now presenting the project in the journal Patterns. The researchers were among the first to utilize the new “Marvin” supercomputer at the University of Bonn. Nicholas Kluge Corrêa and his colleague Aniket Sen are both members of the Transdisciplinary Research Area “Sustainable Futures” at the University of Bonn.

Related posts

Your Espresso Machine Doesn’t Have to Be Fancy to Make Good Coffee

Your Espresso Machine Doesn’t Have to Be Fancy to Make Good Coffee

March 9, 2026
Anthropic Sues Department of Defense Over Supply-Chain Risk Designation

Anthropic Sues Department of Defense Over Supply-Chain Risk Designation

March 9, 2026

GigaVerbo is the name of the dataset developed by the researchers. The project “Tucano: Advancing Neural Text Generation for Portuguese” aims to bridge the resource gap in Portuguese natural language processing (NLP) by providing high-quality datasets and cutting-edge language models specifically designed for the Portuguese language.

The development and release of the GigaVerbo corpus, comprising 200 billion deduplicated tokens, along with the Tucano family of models, aims to foster progress in neural text generation in an open and reproducible manner, promoting equitable access.

The researchers collected several Portuguese corpora from different sources to ensure high linguistic diversity and quality. These corpora were then deduplicated and filtered to form the GigaVerbo dataset. Using this dataset, they trained several decoder models on the Marvin supercomputer, which followed rigorous evaluation and optimization cycles.

The project addresses two major gaps: first, the scarcity of comprehensive open-source resources for Portuguese, a language often overshadowed by resource-rich languages like English. Second, the deficiency in open-source LLM development, which impedes the scientific reproducibility of these models.

The researchers are currently working to scale up their developments in Portuguese by improving their dataset and training larger models. They are also currently developing resources for other low-resource languages, such as Bengali and Hindi, all thanks to Marvin and the University of Bonn.

More information:
Nicholas Kluge Corrêa et al, Tucano: Advancing neural text generation for Portuguese, Patterns (2025). DOI: 10.1016/j.patter.2025.101325

Provided by
University of Bonn

Citation:
New dataset and models boost Portuguese language AI performance to match English (2025, July 23)
retrieved 23 July 2025
from https://techxplore.com/news/2025-07-dataset-boost-portuguese-language-ai.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

Tanzania nears int’l deal to unlock $42 billion LNG project

Next Post

AI Data Centers Driving Power Costs To Record High

Next Post
AI Data Centers Driving Power Costs To Record High

AI Data Centers Driving Power Costs To Record High

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Organised crime threatens Limpopo as border control said to be lacking

Organised crime threatens Limpopo as border control said to be lacking

11 months ago
Ukraine counters Russia’s Africa influence with military training offer in West Africa

Ukraine counters Russia’s Africa influence with military training offer in West Africa

9 months ago
Hummingbirds Are Evolving to Adapt to Life With Humans

Hummingbirds Are Evolving to Adapt to Life With Humans

9 months ago
Waste is no longer a forgotten resource

Waste is no longer a forgotten resource

3 years ago

POPULAR NEWS

  • Mahama attends Liberia’s 178th independence anniversary

    Mahama attends Liberia’s 178th independence anniversary

    0 shares
    Share 0 Tweet 0
  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.