• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

Tool automatically separates training and test data to improve AI evaluation

Simon Osuji by Simon Osuji
May 26, 2025
in Artificial Intelligence
0
Tool automatically separates training and test data to improve AI evaluation
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


DataSAIL splits data perfectly for artificial intelligence
Schematic workflow of DataSAIL. Credit: Nature Communications (2025). DOI: 10.1038/s41467-025-58606-8

A new tool has been developed to better assess the performance of AI models. It was developed by bioinformaticians at Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Helmholtz Institute for Pharmaceutical Research Saarland (HIPS).

Related posts

After the 2026 Winter Olympics, Figure Skating Will Never Be the Same

After the 2026 Winter Olympics, Figure Skating Will Never Be the Same

February 21, 2026
Souvenirs From the 2026 Winter Olympics Are Being Resold for Big Bucks Online

Souvenirs From the 2026 Winter Olympics Are Being Resold for Big Bucks Online

February 21, 2026

“DataSAIL” automatically sorts training and test data so that they differ as much as possible from each other, allowing for the evaluation of whether AI models work reliably with different data. The researchers have now presented their approach in the journal Nature Communications.

Machine learning models are trained with huge amounts of data and must be tested before practical use. For this, the data must first be divided into a larger training set and a smaller test set—the former is used for the model to learn, and the latter is used to check its reliability.

“Only if the data is split in such a way that the test data differ significantly from the training data can it be determined whether the model can later handle novel data, so-called out-of-distribution data, in practice,” explains Prof. Dr. David Blumenthal, bioinformatician at the Department of Artificial Intelligence in Biomedical Engineering (AIBE) at FAU.

AI models are often overestimated

Conventional algorithms are usually not capable of this optimized data splitting, which is why the performance of AI models is often overestimated. Together with researchers from HIPS, David Blumenthal has therefore developed a tool that prevents such misjudgments and sets new standards in an important area of machine learning. The tool, called DataSAIL, automatically splits datasets so that training and test data are as different as possible.

“DataSAIL is a free tool and can be used for all types of data, not just in biological research,” says Blumenthal. “Users only need to define a few parameters for their datasets, and DataSAIL does the rest automatically and consistently.”

Tool automatically separates training and test data to improve AI evaluation
Visualization of exemplary one-dimensional and two-dimensional datasets. Credit: Nature Communications (2025). DOI: 10.1038/s41467-025-58606-8

Tool also processes interaction data

DataSAIL is also the first tool that can be used for the automated splitting of interaction data. These multidimensional data play a role, for example, in drug research.

“Imagine you want to develop AI models that predict the interaction between drugs and target proteins,” says Blumenthal. “Then, when testing these models, you need to evaluate how well they work for altered drug molecules on one hand and for different proteins on the other.”

Additionally, the tool is capable of considering class features, such as an even distribution of male and female subjects in training and test data. This prevents the testing of a model from yielding more unrealistic results for one gender than for the other.

The plan is to further develop the tool in the coming years to reduce the runtime of the algorithms and prepare data even more precisely for various practical scenarios.

More information:
Roman Joeres et al, Data splitting to avoid information leakage with DataSAIL, Nature Communications (2025). DOI: 10.1038/s41467-025-58606-8

Provided by
Friedrich–Alexander University Erlangen–Nurnberg

Citation:
Tool automatically separates training and test data to improve AI evaluation (2025, May 26)
retrieved 26 May 2025
from https://techxplore.com/news/2025-05-tool-automatically-ai.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

SAPS intensifies operations amid National crackdown

Next Post

Switzerland Joins European Cyber Defense Effort

Next Post
Switzerland Joins European Cyber Defense Effort

Switzerland Joins European Cyber Defense Effort

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

India Provides Update on CBDC Currency

India Provides Update on CBDC Currency

1 year ago
Press Statement Following the Joint Assessment Mission of Economic Community of West African States (ECOWAS)-African Union (AU)-United Nations (UN) in Sierra Leone

Press Statement Following the Joint Assessment Mission of Economic Community of West African States (ECOWAS)-African Union (AU)-United Nations (UN) in Sierra Leone

2 years ago
Palantir CEO Alex Karp Recorded a Video About ICE for His Employees

Palantir CEO Alex Karp Recorded a Video About ICE for His Employees

2 weeks ago
Kroger and Lowe’s test AI agents without handing control to Google

Kroger and Lowe’s test AI agents without handing control to Google

1 month ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.