On Saturday, Triplegangers CEO Oleksandr Tomchuk was alerted that his firm’s ecommerce web site was down. It appeared to be some type of distributed denial-of-service assault.
He quickly found the perpetrator was a bot from OpenAI that was relentlessly making an attempt to scrape his total, monumental web site.
“We now have over 65,000 merchandise, every product has a web page,” Tomchuk informed TechCrunch. “Every web page has at the very least three photographs.”
OpenAI was sending “tens of 1000’s” of server requests attempting to obtain all of it, lots of of 1000’s of photographs, together with their detailed descriptions.
“OpenAI used 600 IPs to scrape information, and we’re nonetheless analyzing logs from final week, maybe it’s far more,” he mentioned of the IP addresses the bot used to try to devour his web site.
“Their crawlers have been crushing our web site,” he mentioned “It was principally a DDoS assault.”
Triplegangers’ web site is its enterprise. The seven-employee firm has spent over a decade assembling what it calls the most important database of “human digital doubles” on the internet, that means 3D picture information scanned from precise human fashions.
It sells the 3D object information, in addition to photographs – the whole lot from arms to hair, pores and skin, and full our bodies – to 3D artists, online game makers, anybody who must digitally recreate genuine human traits.
Tomchuk’s crew, primarily based in Ukraine but additionally licensed within the U.S. out of Tampa, Florida, has a phrases of service web page on its web site that forbids bots from taking its photographs with out permission. However that alone did nothing. Web sites should use a correctly configured robotic.txt file with tags particularly telling OpenAI’s bot, GPTBot, to go away the location alone. (OpenAI additionally has a few different bots, ChatGPT-Consumer and OAI-SearchBot, which have their very own tags, in accordance with its data web page on its crawlers.)
Robotic.txt, in any other case often known as the Robots Exclusion Protocol, was created to inform search engine websites what to not crawl as they index the net. OpenAI says on its informational web page that it honors such information when configured with its personal set of do-not-crawl tags, although it additionally warns that it could actually take its bots as much as 24 hours to acknowledge an up to date robotic.txt file.
As Tomchuk skilled, if a web site isn’t correctly utilizing robotic.txt, OpenAI and others take that to imply they will scrape to their hearts’ content material. It’s not an opt-in system.
So as to add insult to harm, not solely was Triplegangers knocked offline by OpenAI’s bot throughout US enterprise hours, however Tomchuk expects a jacked-up AWS invoice due to all the CPU and downloading exercise from the bot.
Robotic.txt additionally isn’t a failsafe. AI corporations voluntarily adjust to it. One other AI startup, Perplexity, fairly famously obtained referred to as out final summer season by a Wired investigation when some proof implied Perplexity wasn’t honoring it.
Can’t know for sure what was taken
By Wednesday, after days of OpenAI’s bot returning, Triplegangers had a correctly configured robotic.txt file in place, and in addition a Cloudflare account set as much as block its GPTBot and a number of other different bots he found, like Barkrowler (an web optimization crawler) and Bytespider (TokTok’s crawler). Tomchuk can also be hopeful he’s blocked crawlers from different AI mannequin corporations. On Thursday morning, the location didn’t crash, he mentioned.
However Tomchuk nonetheless has no affordable technique to discover out precisely what OpenAI efficiently took or to get that materials eliminated. He’s discovered no technique to contact OpenAI and ask. OpenAI didn’t reply to TechCrunch’s request for remark. And OpenAI has to this point did not ship its long-promised opt-out software, as TechCrunch just lately reported.
That is an particularly tough challenge for Triplegangers. “We’re in a enterprise the place the rights are type of a critical challenge, as a result of we scan precise individuals,” he mentioned. With legal guidelines like Europe’s GDPR, “they can’t simply take a photograph of anybody on the internet and use it.”
Triplegangers’ web site was additionally an particularly scrumptious discover for AI crawlers. Multibillion-dollar-valued startups, like Scale AI, have been created the place people painstakingly tag photographs to coach AI. Triplegangers’ web site comprises photographs tagged intimately: ethnicity, age, tattoos vs scars, all physique sorts, and so forth.
The irony is that the OpenAI bot’s greediness is what alerted Triplegangers to how uncovered it was. Had it scraped extra gently, Tomchuk by no means would have recognized, he mentioned.
“It’s scary as a result of there appears to be a loophole that these corporations are utilizing to crawl information by saying “you may decide out when you replace your robotic.txt with our tags,” says Tomchuk, however that places the onus on the enterprise proprietor to know the way to block them.

He desires different small on-line companies to know that the one technique to uncover if an AI bot is taking a web site’s copyrighted belongings is to actively look. He’s actually not alone in being terrorized by them. Homeowners of different web sites just lately informed Enterprise Insider how OpenAI bots crashed their websites and ran up their AWS payments.
The issue grew magnitudes in 2024. New analysis from digital promoting firm DoubleVerify discovered that AI crawlers and scrapers induced an 86% enhance in “basic invalid visitors” in 2024 — that’s, visitors that doesn’t come from an actual person.
Nonetheless, “most websites stay clueless that they have been scraped by these bots,” warns Tomchuk. “Now we’ve got to every day monitor log exercise to identify these bots.”
When you consider it, the entire mannequin operates a bit like a mafia shakedown: the AI bots will take what they need until you have got safety.
“They need to be asking permission, not simply scraping information,” Tomchuk says.