• Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Intelligence
    • Policy Intelligence
    • Security Intelligence
    • Economic Intelligence
    • Fashion Intelligence
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • LBNN Blueprints

A high schooler built a website that lets you challenge AI models to a Minecraft build-off

Simon Osuji by Simon Osuji
March 20, 2025
in Creator Economy
0
A high schooler built a website that lets you challenge AI models to a Minecraft build-off
0
SHARES
3
VIEWS
Share on FacebookShare on Twitter

As conventional AI benchmarking techniques prove inadequate, AI builders are turning to more creative ways to assess the capabilities of generative AI models. For one group of developers, that’s Minecraft, the Microsoft-owned sandbox-building game.

The website Minecraft Benchmark (or MC-Bench) was developed collaboratively to pit AI models against each other in head-to-head challenges to respond to prompts with Minecraft creations. Users can vote on which model did a better job, and only after voting can they see which AI made each Minecraft build.

Image Credits:Minecraft Benchmark (opens in a new window)

For Adi Singh, the 12th-grader who started MC-Bench, the value of Minecraft isn’t so much the game itself, but the familiarity that people have with it — after all, it is the best-selling video game of all time. Even for people who haven’t played the game, it’s still possible to evaluate which blocky representation of a pineapple is better realized.

“Minecraft allows people to see the progress [of AI development] much more easily,” Singh told TechCrunch. “People are used to Minecraft, used to the look and the vibe.”

MC-Bench currently lists eight people as volunteer contributors. Anthropic, Google, OpenAI, and Alibaba have subsidized the project’s use of their products to run benchmark prompts, per MC-Bench’s website, but the companies are not otherwise affiliated.

“Currently we are just doing simple builds to reflect on how far we’ve come from the GPT-3 era, but [we] could see ourselves scaling to these longer-form plans and goal-oriented tasks,” Singh said. “Games might just be a medium to test agentic reasoning that is safer than in real life and more controllable for testing purposes, making it more ideal in my eyes.”

Other games like Pokémon Red, Street Fighter, and Pictionary have been used as experimental benchmarks for AI, in part because the art of benchmarking AI is notoriously tricky.

Researchers often test AI models on standardized evaluations, but many of these tests give AI a home-field advantage. Because of the way they’re trained, models are naturally gifted at certain, narrow kinds of problem-solving, particularly problem-solving that requires rote memorization or basic extrapolation.

Put simply, it’s hard to glean what it means that OpenAI’s GPT-4 can score in the 88th percentile on the LSAT, but cannot discern how many Rs are in the word “strawberry.” Anthropic’s Claude 3.7 Sonnet achieved 62.3% accuracy on a standardized software engineering benchmark, but it is worse at playing Pokémon than most five-year-olds.

MC-Bench is technically a programming benchmark, since the models are asked to write code to create the prompted build, like “Frosty the Snowman” or “a charming tropical beach hut on a pristine sandy shore.”

But it’s easier for most MC-Bench users to evaluate whether a snowman looks better than to dig into code, which gives the project wider appeal — and thus the potential to collect more data about which models consistently score better.

Whether those scores amount to much in the way of AI usefulness is up for debate, of course. Singh asserts that they’re a strong signal, though.

“The current leaderboard reflects quite closely to my own experience of using these models, which is unlike a lot of pure text benchmarks,” Singh said. “Maybe [MC-Bench] could be useful to companies to know if they’re heading in the right direction.”

Source link

Related posts

Google Search rolls out Gemini’s Canvas in AI Mode to all US users

Google Search rolls out Gemini’s Canvas in AI Mode to all US users

March 4, 2026
MacBook Neo, iPhone 17e, and everything else Apple announced this week

MacBook Neo, iPhone 17e, and everything else Apple announced this week

March 4, 2026
Previous Post

KBR Wins $229M CH-47 Chinook Helicopter Modernization Support Deal

Next Post

Seychelles and Brazil to look into widening areas of bilateral cooperation

Next Post
Seychelles and Brazil to look into widening areas of bilateral cooperation

Seychelles and Brazil to look into widening areas of bilateral cooperation

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Concerns Persist Over Oversight of Private Security Linked to Taxi Industry and Paramilitary Training

Concerns Persist Over Oversight of Private Security Linked to Taxi Industry and Paramilitary Training

9 months ago
5 non-African destinations Nigerian passport holders can explore without visa

5 non-African destinations Nigerian passport holders can explore without visa

1 year ago
23 Best Black Friday Deals on Outdoor Gear (2023): REI, Garmin, and More

23 Best Black Friday Deals on Outdoor Gear (2023): REI, Garmin, and More

2 years ago
Polymer bearings can decarbonise the beer industry

Polymer bearings can decarbonise the beer industry

2 years ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • Mahama attends Liberia’s 178th independence anniversary

    0 shares
    Share 0 Tweet 0
  • The world’s top 10 most valuable car brands in 2025

    0 shares
    Share 0 Tweet 0
  • Top 10 African countries with the highest GDP per capita in 2025

    0 shares
    Share 0 Tweet 0
  • Global ranking of Top 5 smartphone brands in Q3, 2024

    0 shares
    Share 0 Tweet 0

Get strategic intelligence you won’t find anywhere else. Subscribe to the Limitless Beliefs Newsletter for monthly insights on overlooked business opportunities across Africa.

Subscription Form

© 2026 LBNN – All rights reserved.

Privacy Policy | About Us | Contact

Tiktok Youtube Telegram Instagram Linkedin X-twitter
No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • LBNN Blueprints
  • Quizzes
    • Enneagram quiz
  • Fashion Intelligence

© 2023 LBNN - All rights reserved.