Data that was leaked, reveals the Chinese International Officer of Amnesty control machine

Photo of author

By [email protected]


A complaint about poverty in rural areas of China. A news report on the corrupt Communist Party member. A cry for help about corrupt policemen who vibrate entrepreneurs.

These are only a few examples of 133,000 nutritional examples in a large advanced language model that is automatically designed for any content that the Chinese government considers sensitive.

The leaked database that Techcrunch has revealed that China has developed a system of artificial intelligence that is already a massive control machine, and extends beyond traditional taboos such as the Tiananmen Square massacre.

The system appears primarily directed towards controlling Chinese citizens online but it can be used for other purposes, such as improving Chinese artificial intelligence models. Actually extensive control.

Chinese flag on the pole behind the shaving wire
This image, which was taken on June 4, 2019, shows the Chinese flag behind Razor Wire in a residential complex in Yengisar, south of Kashgar, in the Xinjiang region of western China.Image credits:Greg Baker / AFP / Getty Images

Xiao Qiang, a researcher at the University of California, told Berkeley, studying Chinese censorship who examined the data group, Techcrunch is “clear evidence” that the Chinese government or its subsidiaries want to use LLMS to improve repression.

“Unlike traditional censorship mechanisms, which depend on the human work of keyword -based and manual liquidation, LLM trained on such instructions would significantly improve the efficiency and improvements of the state -led information,” Qiang told Techcrunch.

This adds to increasing evidence that authoritarian systems quickly adopt the latest artificial intelligence technologies. In February, for example, Openai said Many Chinese entities have caught up using LLMS to track anti -government functions and distort Chinese dissidents.

The Chinese embassy in Washington, DC, said In a statement It opposes “baseless attacks and conditions against China” and that China is of great importance to developing moral intelligence.

Data in the view of

Data set has been discovered By Netaskari security researcherWho shared a sample with Techcrunch after being found stored in the uninterrupted elasticSearch database on the BAIDU server.

This does not indicate any participation from any of the company – all types of institutions store their data with these service providers.

There is no indication of those who, exactly, built the data collection, but the records show that the data is modern, with its latest entries dating back to December 2024.

LLM to reveal the opposition

In the language, it reminds us of a frightening way how people apply ChatGPT, the system creator LLM tasks did not reveal her name to find out If the part of the content has any relationship with sensitive topics related to politics, social life and the army. This content is considered a “maximum priority” and must be marked immediately.

The topics of the top priority include pollution and safety scandals, financial fraud, and labor disputes, which are hot issues in China that sometimes lead to general protests-for example, Chevang protests anti -wealth For 2012.

Any form of “political satire” is explicitly targeted. For example, if someone uses historical analogues to clarify the “current political figures”, then a sign must be placed, and therefore anything should be related to “Taiwan’s policy”. Military issues are widely targeted, including reports of military movements, exercises and weapons.

An extract of the data collection can be seen below. The symbol inside it indicates exciting symbols and LLMS, which confirms that the system uses the artificial intelligence model to make a bid:

Excerpt from the Json symbol that indicates exciting symbols and LLMS. Many contents in Chinese.
Image credits:Charles Roulette

Inside training data

From this huge group consisting of 133,000, an example is that LLM should be evaluated for control, 10 representative parts of the content.

Topics that are likely to provoke social disorders are a frequent theme. One excerpt, for example, is published by a business owner who complains of corrupt local police officers who vibrate entrepreneurs, A rising issue in China It also fights its economy.

Another piece of deplorable content is a rural poverty in China, describing the running towns left by the elderly and children only. There is also a news report on the Communist Party of China (CCP), which expels a local official for extreme corruption and believing in “myths” instead of Marxism.

There are extensive materials related to Taiwan and military issues, such as commenting about Taiwan’s military capabilities and details about a new Chinese fighter. The Chinese word for Taiwan (台湾) is more than 15,000 times in the data, which is a research through Techcrunch shows.

The exact opposition seems to be targeted as well. One of the excerpts included in the database is a story about the rapid nature of the authority that uses the famous Chinese term “when the tree falls, the monkeys scattering.”

Power transfers are a particularly sensitive issue in China thanks to its authoritarian political system.

It was built for “public opinion work

The data collection does not include any information about its creators. But he says he is dedicated to “public opinion work”, which provides a strong idea that it is supposed to serve the goals of the Chinese government.

Michael Castster, director of the Asia Rights Organization Program, Article 19, explained that the “public opinion work” is supervised by the strong Chinese government organizer, the Electronic Space Administration in China (CAC), and usually indicates supervision and advertising efforts.

The ultimate goal is to ensure the protection of Chinese government accounts via the Internet, while any alternative views are cleared. Chinese President Xi Jinping He described himself The Internet as the “front line” of “public work” in CCP.

The repression is more intelligent

The data set examined by Techcrunch is the latest evidence that authoritarian governments seek to take advantage of artificial intelligence for repressive purposes.

Openai He issued a report last month He revealed that an unidentified actor, who is likely to work from China, uses obstetric artificial intelligence to monitor social media talks – especially those who defend human rights protests against China – and send them to the Chinese government.

Contact us

If you know more about how to use artificial intelligence in the state of Official State, you can safely call Charles Rollet on a signal in Charlesrollet.12 You can also contact Techcrunch via Securedrop.

Openai also found that the technology used to generate comments strongly criticizes the prominent Chinese dissident, Cai XIA.

Traditionally, the control methods in China depend on more basic algorithms that automatically prevent content with the terminology included in the black list, such as “Tiananmen” or “XI Jinp Many users have suffered from the use of Deepseek for the first time.

But the latest artificial intelligence technology, such as LLMS, can make censorship more efficient by finding hidden criticism on a large scale. Some artificial intelligence systems can also continue to improve because they raise more and more data.

“I think it is very important to highlight how artificial intelligence drives the control, which makes the state’s control over public discourse more advanced, especially at a time when Chinese Amnesty International models such as Depsic are made of major waves,” Xiao, a researcher at Berkeley, told Techcrunch.



https://techcrunch.com/wp-content/uploads/2021/09/uyghurs.jpg?resize=1200,777

Source link

Leave a Comment