The Biden Administration released an Executive Order (EO) that aims to shape the safe, secure, and trustworthy development of artificial intelligence (AI) in our society. The sweeping EO requires many safeguards on AI, directing agencies to develop and use standards and guidelines to help ensure data security, combat discrimination and inequality, assess the safety of AI models, and track private sector development of AI systems. This post focuses on the EO’s initiatives related to vulnerability management and red teaming. (Check out the Center’s summary of the EO’s security provisions here.)

The EO refers to the use of “AI red-teaming” in several places as a beneficial practice to ensure the safety and security of AI. While we agree that red teaming is an important security activity, it’s critical not to overemphasize it to the detriment of other safeguards. In addition, the way the EO conceives of AI red teaming seems broader than how the term “red teaming” is usually defined in the context of security. As AI red teaming gains traction, definitions and legal protections may need to evolve with it.

Broad definition of “AI red-teaming”

Most traditional definitions of red teaming center on emulating adversarial attacks and exploitation capabilities against an organization’s security posture. See, for example, the definition by the National Institute of Standards and Technology (NIST). 

By contrast, the EO’s definition of “AI red-teaming” is clearly not limited to security and does not necessarily require adversarial attacks. Instead, under the EO, the term means structured testing to find flaws and vulnerabilities. Under the EO, this testing often – but not necessarily – uses adversarial methods and involves working with AI developers to “identify flaws and vulnerabilities, such as harmful or discriminatory outputs from an AI system, unforeseen or undesirable system behaviors, limitations, or potential risks associated with the misuse of the system.” [Sec. 3(d)] This definition is reflected in the Office of Management and Budget (OMB) draft memo that accompanies the EO. 

AI red teaming under the EO becomes testing of an AI system, which the EO also defines broadly, for both security and functionality within established guardrails. It is possible that NIST guidance on red teaming – described below – will narrow down the scope somewhat, but it’s important to recognize that the EO uses AI red teaming as an umbrella term for potentially many kinds of tests.

Red teaming, as traditionally defined, should not be a substitute for other penetration testing, vulnerability disclosure programs, bug bounties, or automated scanning. However, the EO’s definition of red teaming arguably encompasses those and other types of testing activities. While it is not yet clear what the mix of testing requirements will be, we caution against conflating different types of testing as the EO is implemented. Although the overarching purpose of these testing activities is the same, the approach and methodology of each is distinct. In addition, testing itself should be only one component of a security program that includes risk assessment, vulnerability management, incident response plans, and other safeguards. 

Aside from red teaming, the EO includes many other elements related to security, such as requiring NIST to develop best practices for developing safe, secure, and trustworthy AI. Although few of the EO’s additional requirements are directly related to vulnerability discovery and management, we encourage and expect the inclusion of these activities within the broader AI security guidance required under the EO. 

Red teaming AI under the EO

Guidelines: The EO directs NIST to establish appropriate guidelines, procedures and processes to enable developers of AI “to conduct AI red-teaming tests to enable deployment of safe, secure and trustworthy systems.” [Section 4.1(a)(ii).] Within this red teaming effort, the EO requires the development of guidelines related to assessing safety, security, and trustworthiness of dual-use foundational models, as well as the creation of testing environments to support the development of safe, secure, and trustworthy AI models. [Sec. 4.1(a)(ii)(A)-(B).] The EO sets a deadline of July 16, 2024 for these guidelines. 

Premarket reporting to government: The EO requires, under the Defense Production Act, companies developing foundational AI models [defined at Sec. 3(k)] to report the results of testing under NIST’s red teaming guidance to the federal government on an ongoing basis. This includes the discovery of software vulnerabilities and development of associated exploits, among other items not directly related to cybersecurity, such as the possibility for self-replication. [Sec. 4.2(a)(i)(C)] The EO sets a deadline for this requirement, as well as other guidance, of Jan. 28, 2024.

Recommendations to agencies: The EO directs the establishment of an interagency council to provide guidance on government use of AI. This council will work with the Cybersecurity and Infrastructure Security Agency (CISA) to develop recommendations to agencies on “external testing for AI, including AI red-teaming for generative AI.” [Sec. 10.1(b)(viii)(A)]. The EO sets a deadline for this requirement, as well as other premarket reporting, of Mar. 28, 2024.

Protections needed for good faith AI guardrail hacking

As demonstrated by the EO’s inclusion of both security and non-security tests in its definition of AI red teaming, an important emerging practice is hacking AI models for violating guardrails – program controls designed to keep the AI model functioning as the developer intended. This could include, for example, crafting prompts for large language models to generate unacceptable responses, such as discrimination, harmful advice, erroneous calculations, or abusive language. This was also the scope of the White House-endorsed AI Village’s generative red team event at DEF CON 31, for which Harley Geiger and other Hacking Policy Council members served among the judges. 

It is positive that the EO recognizes the value of testing AI systems for these kinds of negative outcomes. As guidance for this activity is developed, the Administration should consider how to appropriately enable red teaming by both authorized sources as well as independent researchers. This would help embrace different perspectives, ensure the best results, and promote a collaborative culture of alignment. However, discussion of the role of such independent and external research is glaringly missing from the EO.

Since “bias bounties” or guardrail hacking is not strictly centered around cybersecurity, it is not clear if it enjoys many of the same legal protections that are now in place for independent security research or vulnerability disclosure - such as exemptions for good faith security research under DMCA Section 1201, or liability protections for information sharing under the Cybersecurity Information Sharing Act of 2015. This lack of clarity poses potential challenges to independent researchers aiming to uncover bias and discriminatory outputs within AI systems. 

Agencies should clarify the extent to which protections for good faith security research include guardrail hacking. Where a gap exists, agencies should extend protections to cover guardrail hacking. Addressing this will help ensure that ethical hackers can effectively assess and disclose AI red-teaming, as defined under the EO, to improve alignment of AI models with society without fearing legal consequences. 

Next Steps: Guidance and Implementation

The inclusion of red teaming in the EO reflects a broader commitment to testing AI systems on a variety of metrics. The scope and methodology of AI red teaming will be developed in NIST guidance as part of the EO implementation. We look forward to working with the Administration and NIST to ensure a safer, more trustworthy AI landscape through red teaming – as broadly defined! 

Harley Geiger & Tanvi Chopra

Read Next

Energy Sector Companies Sign On To G7 Cybersecurity Pledge

Eight companies providing operational control technologies for the energy sector have signed on to a Group of Seven (G7) pledge to abide by a series of cybersecurity principles.

White House Hosts AI Aspirations Conference

Event discussed aspirations for artificial intelligence implementations in the public sector.

The National Vulnerability Database with Kent Landfield and John Banghart (DCP S2 E4)

In our latest Distilling Cyber Policy podcast episode, our hosts are joined by John Banghart and Kent Landfield to discuss the latest developments and ongoing debate around the National Vulnerability Database.