June 16, 2024

Decoding China’s Ambitious Generative AI Regulations

By Sihao Huang and Justin Curl

On April 11th, 2023, China’s top internet regulator proposed new rules for generative AI. The draft builds on previous regulations on deep synthesis technology, which contained detailed provisions on user identity registration, the creation of a database of undesirable inputs, and even the generation of “special objects and scenes” that may harm national security. 

Whereas past regulations from the Cyberspace Administration of China (CAC) focused on harmful outputs that threatened national security, this new draft regulation goes a step further. It mandates that models must be “accurate and true,” adhere to a particular worldview, and avoid discriminating by race, faith, and gender. The document also introduces specific constraints about the way these models are built. Addressing these requirements involves tackling open problems in AI like hallucination, alignment, and bias, for which robust solutions do not currently exist. 

The news coverage in the United States thus far has been relatively superficial and misses the breadth and complexity of the proposed regulations. An article in The Wall Street Journal, for example, hones in on a requirement to adhere to a particular worldview (Art. 4, Sec. 1) — a provision that appeared back in the 2016 cybersecurity law — and primarily focuses on China’s move towards AI censorship. A Bloomberg story focuses on the draft regulation’s potential impact on the speed of AI development by highlighting the need for security reviews (Art. 6).

This post aims to highlight the aspects of the draft regulation that are novel in both the Chinese and International contexts. (Our full unofficial translation of the document from Chinese to English is available here.)

Article 4 Section 1: Models must adhere to a particular worldview. 

This provision specifies that AI-generated content shall “embody the core socialist values” and not threaten the social order. GLM-130B, an open bilingual model from researchers at Tsinghua University, included a similar requirement in its license that the model weights not be used to undermine China’s national interests. Interactions with Baidu’s Ernie Bot preview what this might look like: in response to the question, “Dad and Mom get married, does this count as inbreeding/intermarriage?” the generative model responds with China’s laws on intermarriage. Research has shown that word embeddings trained on Baidu (censored), compared to Chinese Wikipedia (uncensored, produce very different associations between adjectives and people, historical events, and words like democracy, freedom, or equality. One could imagine a future where different countries implement generative models trained on customized corpora that encode drastically different worldviews and value systems.

Article 4 Section 2: Generative AI providers must take active measures to prevent discrimination by race, ethnicity, faith, gender, and other categories. 

This article is unique in the extent to which it clarifies the government’s definition of discrimination. Although previous AI ethics documents from the Chinese government or government-affiliated organizations have listed preventing discrimination as a desired goal, this is one of the first instances of a major document explicitly listing unacceptable forms of discrimination. Notably, however, a similar provision that outlawed algorithmic discrimination in pricing and recommendation systems in 2021 was removed between the draft and enactment stages of the document.

Article 4 Section 3: Providers of AI services cannot leverage their algorithms, data, or platform to engage in unfair competition. 

Unfair competition is a fairly ambiguous term, and this section contains sufficiently broad language that can be applied in many contexts. If Alibaba’s chat app promoted Alibaba products, for example, this regulation could very well be used against them, but it all depends on how this section is enforced. Previously, researchers have argued that this ambiguity is by design, suggesting that vague cybersecurity and data protection laws give the Chinese government wide leeway in interpreting them. 

Article 4 Section 4: Content generated by AI should be “accurate and true,” and measures must be taken to prevent the generation of false information.

February 16th this year, a fake news article generated by ChatGPT in the style of a Chinese government press release began circulating around the internet. This incident led to a police investigation and predated China’s ban on ChatGPT by a week. This provision attempts to regulate an open technical problem – how to ensure that generative AI does not hallucinate and provide false or misleading responses. Given the challenges involved in building robust AI, it remains to be seen how China enforces this rule and how it shapes incentives for Chinese AI companies. 

Article 4 Section 5: Generative AI should not harm people’s mental health, infringe on intellectual property, or infringe on the right to publicity (i.e., someone’s likeness). 

These issues of mental health, intellectual property, and privacy have also surfaced outside China. For instance, a person is reported to blame an AI chatbot for her spouse’s suicide. There have also been reports on AI-driven defamation, pornography, and fake images, as well as a lawsuit against Stable Diffusion about the rights of artists. Again, it would be interesting to see how companies in China respond to these rules. For instance, privacy-preserving AI is an active area of research, and there currently are few watertight solutions to prevent these harms.

Article 5: Individuals and organizations using generative AI models to provide services will be held legally responsible for content that violates these regulations. 

Clearly assigning liability could help avoid the same confusion currently playing out in the United States about alleged intellectual property violations or illegal content produced by generative AI models. Clearly-defined legal consequences, however, may also come at the cost of slower model adoption and diffusion as organizations are wary of legal repercussions, especially considering the potential technical difficulty of completely satisfying the regulations outlined in Article 4.

Note that this document applies to service providers: these restrictions do not apply to AI development work as long as the models are not publicly deployed. This may lead to a growing gap in AI capabilities between China’s public-facing and cutting-edge research models, as the latter could use a much broader range of training data and face less onerous restrictions.

Article 6: Generative AI models must undergo security assessments and receive government approval before being offered to users.

This provision adopts a pre-clearance framework for the use of generative AI systems. While the regulation does not specify what would go into the security assessments and approval process, we hope it will build on fairness research about auditing and documenting models. As noted previously, this process could slow adoption, and serves as a mechanism for the CAC to enforce compliance with its national security rules and ensure alignment with the Chinese Communist Party’s ideology.

Article 7: Strict requirements for pre-training data. 

The draft sets specific requirements about how the models should be built, not just their desired outputs. It states that the training data must (i) comply with the Network Security Law (e.g., not contain anti-government material), (ii) exclude content that infringes upon intellectual property, (iii) obtain subject approval if it contains private information, and finally, (iv) “guarantee the authenticity, accuracy, objectivity, and diversity of the data.” 

The first three stipulations may greatly restrict the amount of data available for training large generative models. For example, much of the pre-training data for GPT-3 (the data for ChatGPT and GPT-4 is unavailable publicly) and Stable Diffusion came from scraping text and images from the web. Because web-scraped datasets like these would include copyrighted books, images, and personal information, they would require incredibly expensive cleaning before Chinese companies could use them in compliance with these regulations. 

The fourth stipulation is also significant. Although ensuring accurate and diverse training data may promote fairness by reducing performance disparities between demographic groups, collecting that data may place undue burdens on minority groups. An additional challenge is how authenticity, accuracy, objectivity, and diversity will be measured, whether it is enforced through national standards or company-designed rules.

Article 15: For non-compliant generations, in addition to taking measures like content filters, model optimization training should be used to prevent re-generation of the same issues within three months.

This article addresses the concern that AI service providers will adopt band-aid solutions to specific issues. Instead, providers must fundamentally modify the model within three months to prevent the issue from arising. We have seen quick fix band-aid solutions before, for instance, when Bing AI was limited to five messages to stop it from providing strange and unsettling responses. Or when Baidu’s Ernie also refused to answer anything about Chinese President Xi Jinping. Presumably, this provision would urge companies not to use such sweeping bans and train the models in a way that generates an acceptable response.

We were often surprised by the CAC’s apparent willingness to impose considerable costs of compliance on service providers who use generative AI models. If fully enforced, provisions mandating strict pre-training data controls, security assessments, model optimization, and de-biasing would be incredibly expensive. This expense may slow adoption and progress in deployed, public-facing AI systems. Yet just as complying with China’s cybersecurity laws required Chinese tech firms to build extensive content moderation systems, these new rules may incentivize companies to invest more in building robust, interpretable, and aligned generative AI systems, albeit ones that espouse the country’s values.

In summary, China is moving quickly on AI governance and appears willing to develop regulations that take firm positions on difficult questions about misinformation, privacy, intellectual property, data, and unfair competition. Furthermore, China has been quick to put these regulations in action, entrusting them to a powerful enforcement authority. Many countries are now looking for ways to govern these new technologies. We would be concerned, though not surprised, if some turn to China as a model. 

Justin Curl and Sihao Huang are both currently studying at Tsinghua University as Schwarzman Scholars from the U.S. Justin graduated with a degree in Computer Science from Princeton University and works with Mihir Kshirsagar on issues of AI Governance. Sihao graduated with dual-degrees in Physics and Electrical Engineering from MIT. We thank Mihir for his editorial suggestions.