OpenAI’s Bot Crushes Seven-Person Company’s Website ‘Like a DDoS Attack’

An anonymous reader quotes a report from TechCrunch: On Saturday, Triplegangers CEO Oleksandr Tomchuk was alerted that his company’s e-commerce site was down. It looked to be some kind of distributed denial-of-service attack. He soon discovered the culprit was a bot from OpenAI that was relentlessly attempting to scrape his entire, enormous site. “We have over 65,000 products, each product has a page,” Tomchuk told TechCrunch. “Each page has at least three photos.” OpenAI was sending “tens of thousands” of server requests trying to download all of it, hundreds of thousands of photos, along with their detailed descriptions. “OpenAI used 600 IPs to scrape data, and we are still analyzing logs from last week, perhaps it’s way more,” he said of the IP addresses the bot used to attempt to consume his site. “Their crawlers were crushing our site,” he said “It was basically a DDoS attack.”

Triplegangers’ website is its business. The seven-employee company has spent over a decade assembling what it calls the largest database of “human digital doubles” on the web, meaning 3D image files scanned from actual human models. It sells the 3D object files, as well as photos — everything from hands to hair, skin, and full bodies — to 3D artists, video game makers, anyone who needs to digitally recreate authentic human characteristics. […] To add insult to injury, not only was Triplegangers knocked offline by OpenAI’s bot during U.S. business hours, but Tomchuk expects a jacked-up AWS bill thanks to all of the CPU and downloading activity from the bot. Triplegangers initially lacked a properly configured robots.txt file, which allowed the bot to freely scrape its site since the system interprets the absence of such a file as permission. It’s not an opt-in system.

Once the file was updated with specific tags to block OpenAI’s bot, along with additional defenses like Cloudflare, the scraping stopped. However, robots.txt is not foolproof since compliance by AI companies is voluntary, leaving the burden on website owners to monitor and block unauthorized access proactively. “[Tomchuk] wants other small online business to know that the only way to discover if an AI bot is taking a website’s copyrighted belongings is to actively look,” reports TechCrunch.

Source link