AI crawls for Wikimedia Commonas on the frequency range to increase 50 %

Wikimedia Foundation, Plash Organization for Wikipedia and dozens or so last Credibility projects said on Wednesday that the consumption of the frequency range of multimedia downloads from Wikimedia rumors It has increased by 50 % since January 2024.

The reason, the uniform books in a Blog post Tuesday, this is not due to the increasing demand of people who love knowledge, but from the data that thirsts for data is looking to train artificial intelligence models.

“Our infrastructure is designed to maintain sudden traffic paths from people during high interest events, but the amount of traffic created by unprecedented scraper robots, and represents the risks and increasing costs,” says the post.

Wikimedia Commons is a reservoir that is freely accessible to photos, videos and sound files available under open licenses or in the public domain.

The drilling is down, Wikimedia says that nearly two-thirds (65 %) is one of the most “expensive” traffic-the most intense resource in terms of the type of consumer content-was a robot. However, only 35 % of the Beautyws aesthetics come from these robots. The reason for this contrast, according to Wikimedia, is that the frequently accessible content remains closer to the user in its catering memory, while the other content that is accessed less in the “basic data center” is stored, which is more expensive to serve the content than. This is the type of content that robots usually look for.

“While human readers tend to focus on specific topics – similar – often – creeping robots tend to” read larger “larger numbers of pages and visit less popular pages.” “This means that these types of requests are likely to be directed to the basic data center, which makes them more expensive in terms of consuming our resources.”

The long and cheap in all of this is that the Wikimedia’s reliable team “sites sites must spend a lot of time and resources that prevent crawl to avoid disrupting ordinary users. All this before we think about the cloud costs facing the basis.

In fact, this is part of the rapid trend that threatens the presence of the open internet. Last month, software engineer and open source lawyerDrew Devolt highlights the truth AI Crawles ignores the “Robots.txt” files designed to ward off automatic traffic. And “and”Burghamati engineer“Giglie Ausrez Also complained Last week, artificial intelligence has paid from companies like Meta, the frequency domain requests for its own projects.

While the infrastructure is open, in particular, In the shooting lineThe developers fight with “intelligence and revenge”, such as TECHRUNCH Books last week. Some technology companies do their work to address this problem as well – Cloudflare, for example, recently I launched artificial intelligence mazeAnd which uses the content created by artificial intelligence to slow the crawl.

However, it is a cat and mouse game that can eventually force many publishers to coverage behind records and Paywalls-to The damage of everyone who uses the web today.

https://techcrunch.com/wp-content/uploads/2025/04/GettyImages-2195019925.jpg?resize=1200,800

Source link

Kolkata Knight Rides versus Sunrisers Hyderabad Dream11 IPL 2025 prediction

Tesla has suffered from the worst quarter since 2022, with delivery operations stumbled

Leave a Comment Cancel reply