UVa AIML Seminar
The AI and Machine Learning Seminar @ UVa

Invisible Foes: Crafting and Cracking AI in the Shadows of Language - Poison Data and Jailbreak Prompts for LLMs


Furong Huang
Department of Computer Science, University of Maryland

Time: 2024-04-17, 12:00 - 13:00 ET
Location: Thorton A238 and Zoom

Abstract Large language models (LLMs) face critical threats from poisoned finetuning data and jailbreak prompts, jeopardizing AI dependability. This discussion examines these covert challenges and the methodologies that exploit AI at both the training and deployment stages. The talk will introduce a novel technique for stealthily poisoning Vision-Language Models (VLMs) by subtly altering images to mislead their conceptual understanding, demonstrating its ability to not only misidentify objects but also to craft persuasive narratives, thereby exposing a critical vulnerability in VLMs that could subliminally alter beliefs and values, calling for urgent ethical safeguards in AI deployment. Transitioning to the testing phase, we address jailbreak prompts, designed to manipulate LLMs into unauthorized actions or outputs. We will discuss our recent work, AutoDAN, a novel, interpretable, gradient-based adversarial attack that combines the readability of manual jailbreak attacks with the efficiency of adversarial attacks, generating prompts that bypass perplexity filters and generalize to various harmful behaviors, thereby challenging the optimism surrounding current defenses against LLM vulnerabilities.

Bio Furong Huang is an Assistant Professor of the Department of Computer Science at University of Maryland. She received her Ph.D. in electrical engineering and computer science from UC Irvine in 2016, after which she spent one year as a postdoctoral researcher at Microsoft Research NYC. She works on statistical and trustworthy machine learning, foundation models and reinforcement learning, with specialization in domain adaptation, algorithmic robustness and fairness. With a focus on high-dimensional statistics and sequential decision-making, she develops efficient, robust, scalable, sustainable, ethical and responsible machine learning algorithms. She is recognized for her contributions with awards including best paper awards, the MIT Technology Review Innovators Under 35 Asia Pacific, the MLconf Industry Impact Research Award, the NSF CRII Award, the Microsoft Accelerate Foundation Models Research award, the Adobe Faculty Research Award, three JP Morgan Faculty Research Awards and Finalist of AI in Research - AI researcher of the year for Women in AI Awards North America.