Wednesday, May 21, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

How OpenAI’s bot crushed this seven-person company’s website ‘like a DDoS attack’

Simon Osuji by Simon Osuji
January 12, 2025
in Creator Economy
0
How OpenAI’s bot crushed this seven-person company’s website ‘like a DDoS attack’
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

On Saturday, Triplegangers CEO Oleksandr Tomchuk was alerted that his company’s e-commerce site was down. It looked to be some kind of distributed denial-of-service attack. 

He soon discovered the culprit was a bot from OpenAI that was relentlessly attempting to scrape his entire, enormous site. 

“We have over 65,000 products, each product has a page,” Tomchuk told TechCrunch. “Each page has at least three photos.” 

OpenAI was sending “tens of thousands” of server requests trying to download all of it, hundreds of thousands of photos, along with their detailed descriptions. 

“OpenAI used 600 IPs to scrape data, and we are still analyzing logs from last week, perhaps it’s way more,” he said of the IP addresses the bot used to attempt to consume his site. 

“Their crawlers were crushing our site,” he said “It was basically a DDoS attack.”

Triplegangers’ website is its business. The seven-employee company has spent over a decade assembling what it calls the largest database of “human digital doubles” on the web, meaning 3D image files scanned from actual human models. 

It sells the 3D object files, as well as photos — everything from hands to hair, skin, and full bodies — to 3D artists, video game makers, anyone who needs to digitally recreate authentic human characteristics.

Tomchuk’s team, based in Ukraine but also licensed in the U.S. out of Tampa, Florida, has a terms of service page on its site that forbids bots from taking its images without permission. But that alone did nothing. Websites must use a properly configured robot.txt file with tags specifically telling OpenAI’s bot, GPTBot, to leave the site alone. (OpenAI also has a couple of other bots, ChatGPT-User and OAI-SearchBot, that have their own tags, according to its information page on its crawlers.)

Robot.txt, otherwise known as the Robots Exclusion Protocol, was created to tell search engine sites what not to crawl as they index the web. OpenAI says on its informational page that it honors such files when configured with its own set of do-not-crawl tags, though it also warns that it can take its bots up to 24 hours to recognize an updated robot.txt file.

As Tomchuk experienced, if a site isn’t properly using robot.txt, OpenAI and others take that to mean they can scrape to their hearts’ content. It’s not an opt-in system.

To add insult to injury, not only was Triplegangers knocked offline by OpenAI’s bot during U.S. business hours, but Tomchuk expects a jacked-up AWS bill thanks to all of the CPU and downloading activity from the bot.

Robot.txt also isn’t a failsafe. AI companies voluntarily comply with it. Another AI startup, Perplexity, pretty famously got called out last summer by a Wired investigation when some evidence implied Perplexity wasn’t honoring it.

Triplegangers product page
Each of these is a product, with a product page that includes multiple more photos. Used by permission.Image Credits:Triplegangers (opens in a new window)

Can’t know for certain what was taken

By Wednesday, after days of OpenAI’s bot returning, Triplegangers had a properly configured robot.txt file in place, and also a Cloudflare account set up to block its GPTBot and several other bots he discovered, like Barkrowler (an SEO crawler) and Bytespider (TokTok’s crawler). Tomchuk is also hopeful he’s blocked crawlers from other AI model companies. On Thursday morning, the site didn’t crash, he said.

But Tomchuk still has no reasonable way to find out exactly what OpenAI successfully took or to get that material removed. He’s found no way to contact OpenAI and ask. OpenAI did not respond to TechCrunch’s request for comment. And OpenAI has so far failed to deliver its long-promised opt-out tool, as TechCrunch recently reported.

This is an especially tricky issue for Triplegangers. “We’re in a business where the rights are kind of a serious issue, because we scan actual people,” he said. With laws like Europe’s GDPR, “they cannot just take a photo of anyone on the web and use it.”

Triplegangers’ website was also an especially delicious find for AI crawlers. Multibillion-dollar-valued startups, like Scale AI, have been created where humans painstakingly tag images to train AI. Triplegangers’ site contains photos tagged in detail: ethnicity, age, tattoos versus scars, all body types, and so on.

The irony is that the OpenAI bot’s greediness is what alerted Triplegangers to how exposed it was. Had it scraped more gently, Tomchuk never would have known, he said.

“It’s scary because there seems to be a loophole that these companies are using to crawl data by saying “you can opt out if you update your robot.txt with our tags,” says Tomchuk, but that puts the onus on the business owner to understand how to block them.

openai crawler log
Triplegangers’ server logs showed how ruthelessly an OpenAI bot was accessing the site, from hundreds of IP addresses. Used by permission.

He wants other small online businesses to know that the only way to discover if an AI bot is taking a website’s copyrighted belongings is to actively look. He’s certainly not alone in being terrorized by them. Owners of other websites recently told Business Insider how OpenAI bots crashed their sites and ran up their AWS bills.

The problem grew magnitudes in 2024. New research from digital advertising company DoubleVerify found that AI crawlers and scrapers caused an 86% increase in “general invalid traffic” in 2024 — that is, traffic that doesn’t come from a real user.

Still, “most sites remain clueless that they were scraped by these bots,” warns Tomchuk. “Now we have to daily monitor log activity to spot these bots.”

When you think about it, the whole model operates a bit like a mafia shakedown: The AI bots will take what they want unless you have protection.

“They should be asking permission, not just scraping data,” Tomchuk says.

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.

Source link

Related posts

Uber Freight bets big on AI tools to grow its business

Uber Freight bets big on AI tools to grow its business

May 21, 2025
Google Meet is getting real-time speech translation

Google Meet is getting real-time speech translation

May 21, 2025
Previous Post

Three Steps to Simplify Paying Your Taxes in Retirement

Next Post

Ghana’s president keeps IMF promise as he trims down 7 ministries

Next Post
Ghana’s president keeps IMF promise as he trims down 7 ministries

Ghana’s president keeps IMF promise as he trims down 7 ministries

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

BMW recalling over 1,100 vehicles in US over airbag inflator concern, NHTSA says

BMW recalling over 1,100 vehicles in US over airbag inflator concern, NHTSA says

10 months ago
US Army Seeks New Robotic Infantry Support Vehicle to Fill Logistics Gaps

US Army Seeks New Robotic Infantry Support Vehicle to Fill Logistics Gaps

8 months ago
Airport City Blueprint Presented to Business Community with New Brand “SKYTOPIA”

Airport City Blueprint Presented to Business Community with New Brand “SKYTOPIA”

4 months ago
UAE fintech sector to hit $6.43bn by 2030

UAE fintech sector to hit $6.43bn by 2030

3 days ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.