{"id":25672,"date":"2025-01-14T05:00:58","date_gmt":"2025-01-14T13:00:58","guid":{"rendered":"https:\/\/www.palada.net\/index.php\/2025\/01\/14\/news-19395\/"},"modified":"2025-01-14T05:00:58","modified_gmt":"2025-01-14T13:00:58","slug":"news-19395","status":"publish","type":"post","link":"https:\/\/www.palada.net\/index.php\/2025\/01\/14\/news-19395\/","title":{"rendered":"3 takeaways from red teaming 100 generative AI products"},"content":{"rendered":"<p><strong>Credit to Author: Blake Bullwinkel and Ram Shankar Siva Kumar| Date: Mon, 13 Jan 2025 16:00:00 +0000<\/strong><\/p>\n<p>Microsoft\u2019s AI red team is excited to share our whitepaper, &#8220;<a href=\"https:\/\/aka.ms\/AIRTLessonsPaper\">Lessons from Red Teaming 100 Generative AI Products<\/a>.&#8221;<\/p>\n<p>The AI red team was formed in 2018 to address the growing landscape of AI safety and security risks. Since then, we have expanded the scope and scale of our work significantly. We are one of the first red teams in the industry to cover both security and responsible AI, and red teaming has become a key part of Microsoft\u2019s approach to generative AI product development. Red teaming is the first step in identifying potential harms and is followed by important initiatives at the company to measure, manage, and govern AI risk for our customers. Last year, we also announced&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/02\/22\/announcing-microsofts-open-automation-framework-to-red-team-generative-ai-systems\/\">PyRIT<\/a>&nbsp;(The Python Risk Identification Tool for generative AI), an open-source toolkit to help researchers identify vulnerabilities in their own AI systems.<\/p>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/airt_ops_pie_chart-1.webp\" alt=\"Pie chart showing the percentage breakdown of products tested by the Microsoft AI red team (AIRT). As of October 2024, we have conducted more than 80 operations covering more than 100 products.\" class=\"wp-image-137017 webp-format\" srcset=\"\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/airt_ops_pie_chart-1.webp\"><figcaption class=\"wp-element-caption\">Pie chart showing the percentage breakdown of products tested by the Microsoft AI red team. As of October 2024, we had red teamed more than 100 generative AI products.<\/figcaption><\/figure>\n<p>With a focus on our expanded mission, we have now red-teamed more than 100 generative AI products. The whitepaper we are now releasing provides more detail about our approach to AI red teaming and includes the following highlights:<\/p>\n<ul class=\"wp-block-list\">\n<li>Our AI red team ontology, which we use to model the main components of a cyberattack including adversarial or benign actors, TTPs (Tactics, Techniques, and Procedures), system weaknesses, and downstream impacts. This ontology provides a cohesive way to interpret and disseminate a wide range of safety and security findings.<\/li>\n<li>Eight main lessons learned from our experience red teaming more than 100 generative AI products. These lessons are geared towards security professionals looking to identify risks in their own AI systems, and they shed light on how to align red teaming efforts with potential harms in the real world.<\/li>\n<li>Five case studies from our operations, which highlight the wide range of vulnerabilities that we look for including traditional security, responsible AI, and psychosocial harms. Each case study demonstrates how our ontology is used to capture the main components of an attack or system vulnerability.<\/li>\n<\/ul>\n<div class=\"wp-block-msxcm-cta-block\" data-moray data-bi-an=\"CTA Block\">\n<div class=\"card d-block mx-ng mx-md-0\">\n<div class=\"row no-gutters material-color-brand-dark\">\n<div class=\"col-md-4\"> \t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/CLO25-Security-Lifestyle-Getty-1084167628-1024x683.jpg\" class=\"card-img img-object-cover\" alt=\"Two colleagues collaborating at a desk.\" srcset=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/CLO25-Security-Lifestyle-Getty-1084167628-1024x683.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/CLO25-Security-Lifestyle-Getty-1084167628-300x200.jpg 300w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/CLO25-Security-Lifestyle-Getty-1084167628-768x513.jpg 768w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/CLO25-Security-Lifestyle-Getty-1084167628-1536x1025.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/CLO25-Security-Lifestyle-Getty-1084167628-2048x1367.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t<\/div>\n<div class=\"d-flex col-md\">\n<div class=\"card-body align-self-center p-4 p-md-5\">\n<h2>Lessons from Red Teaming 100 Generative AI Products<\/h2>\n<div class=\"mb-3\">\n<p>Discover more about our approach to AI red teaming.<\/p>\n<\/p><\/div>\n<div class=\"link-group\"> \t\t\t\t\t\t\t<a href=\"https:\/\/aka.ms\/AIRTLessonsPaper\" class=\"btn btn-link text-decoration-none p-0\" > \t\t\t\t\t\t\t\t<span>Read the whitepaper<\/span> \t\t\t\t\t\t\t\t<span class=\"glyph-append glyph-append-chevron-right glyph-append-xsmall\"><\/span> \t\t\t\t\t\t\t<\/a> \t\t\t\t\t\t<\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<h2 class=\"wp-block-heading\" id=\"microsoft-ai-red-team-tackles-a-multitude-of-scenarios\">Microsoft AI red team tackles a multitude of scenarios<\/h2>\n<p>Over the years, the <a href=\"https:\/\/learn.microsoft.com\/security\/ai-red-team\/\" target=\"_blank\" rel=\"noreferrer noopener\">AI red team<\/a> has tackled a wide assortment of scenarios that other organizations have likely encountered as well. We focus on vulnerabilities most likely to cause harm in the real world, and our whitepaper shares case studies from our operations that highlight how we have done this in four scenarios including security, responsible AI, dangerous capabilities (such as a model&#8217;s ability to generate hazardous content), and psychosocial harms. As a result, we are able to recognize a variety of potential cyberthreats and adapt quickly when confronting new ones.<\/p>\n<p>This mission has given our red team a breadth of experiences to skillfully tackle risks regardless of:<\/p>\n<ul class=\"wp-block-list\">\n<li>System type, including Microsoft Copilot, models embedded in systems, and open-source models.<\/li>\n<li>Modality, whether text-to-text, text-to-image, or text-to-video.<\/li>\n<li>User type\u2014enterprise user risk, for example, is different from consumer risks and requires a unique red teaming approach. Niche audiences, such as for a specific industry like healthcare, also deserve a nuanced approach.&nbsp;<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\" id=\"top-three-takeaways-from-the-whitepaper\">Top three takeaways from the whitepaper<\/h2>\n<p>AI red teaming is a practice for probing the safety and security of generative AI systems.&nbsp;Put simply, we &#8220;break&#8221; the technology so that others can build it back stronger. Years of red teaming have given us invaluable insight into the most effective strategies. In reflecting on the eight lessons discussed in the whitepaper, we can distill three top takeaways that business leaders should know.<\/p>\n<h3 class=\"wp-block-heading\" id=\"takeaway-1-generative-ai-systems-amplify-existing-security-risks-and-introduce-new-ones\">Takeaway 1: Generative AI systems amplify existing security risks and introduce new ones<\/h3>\n<p>The integration of generative AI models into modern applications has introduced novel cyberattack vectors. However, many discussions around AI security overlook existing vulnerabilities. AI red teams should pay attention to cyberattack vectors both old and new.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Existing security risks<\/strong>: Application security risks often stem from improper security engineering practices including outdated dependencies, improper error handling, credentials in source, lack of input and output sanitization, and insecure packet encryption. One of the case studies in our whitepaper describes how an outdated FFmpeg component in a video processing AI application introduced a well-known security vulnerability called server-side request forgery (SSRF), which could allow an adversary to escalate their system privileges.<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/airt_ssrf_vuln-1-1024x433.webp\" alt=\"Flow chart showing an SSRF vulnerability in the GenAI application from red team case study.\" class=\"wp-image-137018 webp-format\" srcset=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/airt_ssrf_vuln-1-1024x433.webp 1024w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/airt_ssrf_vuln-1-300x127.webp 300w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/airt_ssrf_vuln-1-768x325.webp 768w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/airt_ssrf_vuln-1-1536x650.webp 1536w, https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/airt_ssrf_vuln-1-2048x867.webp 2048w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/wp-content\/uploads\/2025\/01\/airt_ssrf_vuln-1-1024x433.webp\"><figcaption class=\"wp-element-caption\">Illustration of the SSRF vulnerability in the video-processing generative AI application.<\/figcaption><\/figure>\n<ul class=\"wp-block-list\">\n<li><strong>Model-level weaknesses<\/strong>: AI models have expanded the cyberattack surface by introducing new vulnerabilities. Prompt injections, for example, exploit the fact that AI models often struggle to distinguish between system-level instructions and user data. Our whitepaper includes a red teaming case study  about how we used prompt injections to trick a vision language model.<\/li>\n<\/ul>\n<blockquote class=\"wp-block-quote blockquote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Red team tip<\/strong>: AI red teams should be attuned to new cyberattack vectors while remaining vigilant for existing security risks. AI security best practices should include basic cyber hygiene.<a id=\"_msocom_1\"><\/a><\/p>\n<\/blockquote>\n<h3 class=\"wp-block-heading\" id=\"takeaway-2-humans-are-at-the-center-of-improving-and-securing-ai\">Takeaway 2: Humans are at the center of improving and securing AI<\/h3>\n<p>While automation tools are useful for creating prompts, orchestrating cyberattacks, and scoring responses, red teaming can\u2019t be automated entirely. AI red teaming relies heavily on human expertise.<\/p>\n<p>Humans are important for several reasons, including:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Subject matter expertise<\/strong>: LLMs are capable of evaluating whether an AI model response contains hate speech or explicit sexual content, but they\u2019re not as reliable at assessing content in specialized areas like medicine, cybersecurity, and CBRN (chemical, biological, radiological, and nuclear). These areas require subject matter experts who can evaluate content risk for AI red teams.<\/li>\n<li><strong>Cultural competence<\/strong>: Modern language models use primarily English training data, performance benchmarks, and safety evaluations. However, as AI models are deployed around the world, it is crucial to design red teaming probes that not only account for linguistic differences but also redefine harms in different political and cultural contexts. These methods can be developed only through the collaborative effort of people with diverse cultural backgrounds and expertise.<\/li>\n<li><strong>Emotional intelligence<\/strong>: In some cases, emotional intelligence is required to evaluate the outputs of AI models. One of the case studies in our whitepaper discusses how we are probing for psychosocial harms by investigating how chatbots respond to users in distress. Ultimately, only humans can fully assess the range of interactions that users might have with AI systems in the wild.<\/li>\n<\/ul>\n<blockquote class=\"wp-block-quote blockquote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Red team tip<\/strong>: Adopt tools like PyRIT to scale up operations but keep humans in the red teaming loop for the greatest success at identifying impactful AI safety and security vulnerabilities.<\/p>\n<\/blockquote>\n<h3 class=\"wp-block-heading\" id=\"takeaway-3-defense-in-depth-is-key-for-keeping-ai-systems-safe\">Takeaway 3: Defense in depth is key for keeping AI systems safe<\/h3>\n<p>Numerous mitigations have been developed to address the safety and security risks posed by AI systems. However, it is important to remember that mitigations do not eliminate risk entirely. Ultimately, AI red teaming is a continuous process that should adapt to the rapidly evolving risk landscape and aim to raise the cost of successfully attacking a system as much as possible.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Novel harm categories<\/strong>: As AI systems become more sophisticated, they often introduce entirely new harm categories. For example, one of our case studies explains how we probed a state-of-the-art LLM for risky persuasive capabilities. AI red teams must constantly update their practices to anticipate and probe for these novel risks.<\/li>\n<li><strong>Economics of cybersecurity<\/strong>: Every system is vulnerable because humans are fallible, and adversaries are persistent. However, you can deter adversaries by raising the cost of attacking a system beyond the value that would be gained. One way to raise the cost of cyberattacks is by using break-fix cycles.<sup>1<\/sup> This involves undertaking multiple rounds of red teaming, measurement, and mitigation\u2014sometimes referred to as &#8220;purple teaming&#8221;\u2014to strengthen the system to handle a variety of attacks.<\/li>\n<li><strong>Government action<\/strong>: Industry action to defend against cyberattackers and<br \/>failures is one side of the AI safety and security coin. The other side is<br \/>government action in a way that could deter and discourage these broader<br \/>failures. Both public and private sectors need to demonstrate commitment and vigilance, ensuring that cyberattackers no longer hold the upper hand and society at large can benefit from AI systems that are inherently safe and secure.<\/li>\n<\/ul>\n<blockquote class=\"wp-block-quote blockquote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Red team tip<\/strong>: Continually update your practices to account for novel harms, use break-fix cycles to make AI systems as safe and secure as possible, and invest in robust measurement and mitigation techniques.<\/p>\n<\/blockquote>\n<h2 class=\"wp-block-heading\" id=\"advance-your-ai-red-teaming-expertise\">Advance your AI red teaming expertise<\/h2>\n<p>The \u201cLessons From Red Teaming 100 Generative AI Products\u201d whitepaper includes our AI red team ontology, additional lessons learned, and five case studies from our operations. We hope you will find the paper and the ontology useful in organizing your own AI red teaming exercises and developing further case studies by taking advantage of&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/02\/22\/announcing-microsofts-open-automation-framework-to-red-team-generative-ai-systems\/\">PyRIT<\/a>, our open-source automation framework.<\/p>\n<p>Together, the cybersecurity community can refine its approaches and share best practices to effectively address the challenges ahead.&nbsp;<a href=\"https:\/\/aka.ms\/AIRTLessonsPaper\">Download our red teaming whitepaper<\/a>&nbsp;to read more about what we\u2019ve learned. As we progress along our own continuous learning journey, we would welcome your feedback and hearing about your own AI red teaming experiences.<\/p>\n<h2 class=\"wp-block-heading\" id=\"learn-more-with-microsoft-security\">Learn more with Microsoft Security<\/h2>\n<p>To learn more about Microsoft Security solutions, visit our&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/security\/business\" target=\"_blank\" rel=\"noreferrer noopener\">website.<\/a>&nbsp;Bookmark the&nbsp;<a href=\"https:\/\/www.microsoft.com\/security\/blog\/\" target=\"_blank\" rel=\"noreferrer noopener\">Security blog<\/a>&nbsp;to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (<a href=\"https:\/\/www.linkedin.com\/showcase\/microsoft-security\/\">Microsoft Security<\/a>) and X (<a href=\"https:\/\/twitter.com\/@MSFTSecurity\" target=\"_blank\" rel=\"noreferrer noopener\">@MSFTSecurity<\/a>)&nbsp;for the latest news and updates on cybersecurity.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<p>\u00b9 <a href=\"https:\/\/arxiv.org\/abs\/2407.13833\">Phi-3 Safety Post-Training: Aligning Language Models with a &#8220;Break-Fix&#8221; Cycle<\/a><\/p>\n<p>The post <a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2025\/01\/13\/3-takeaways-from-red-teaming-100-generative-ai-products\/\">3 takeaways from red teaming 100 generative AI products<\/a> appeared first on <a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\">Microsoft Security Blog<\/a>.<\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2025\/01\/13\/3-takeaways-from-red-teaming-100-generative-ai-products\/\" target=\"bwo\" >https:\/\/blogs.technet.microsoft.com\/mmpc\/feed\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p><strong>Credit to Author: Blake Bullwinkel and Ram Shankar Siva Kumar| Date: Mon, 13 Jan 2025 16:00:00 +0000<\/strong><\/p>\n<p>Since 2018, Microsoft&#8217;s AI Red Team has probed generative AI products for critical safety and security vulnerabilities. Read our latest blog for three lessons we&#8217;ve learned along the way.<\/p>\n<p>The post <a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2025\/01\/13\/3-takeaways-from-red-teaming-100-generative-ai-products\/\">3 takeaways from red teaming 100 generative AI products<\/a> appeared first on <a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\">Microsoft Security Blog<\/a>.<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[10759,10378],"tags":[],"class_list":["post-25672","post","type-post","status-publish","format-standard","hentry","category-microsoft","category-security"],"_links":{"self":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/25672","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/comments?post=25672"}],"version-history":[{"count":0,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/posts\/25672\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/media?parent=25672"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/categories?post=25672"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.palada.net\/index.php\/wp-json\/wp\/v2\/tags?post=25672"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}