Privacy Challenge as Models Scale: Training Efficiency and Amplified Risks

Ruixuan Liu
Emory University

Time: 2024-10-30, 12:00 - 13:00 ET
Location: Rice 540 and Zoom

Abstract As AI models scale-up, developers face mounting challenges in privacy-preserving training efficiency while confronting increased vulnerability to attacks. This talk will explore these challenges by examining both defensive solutions and attack risks, providing a in-depth view of the evolving landscape of privacy in large-scale AI ecosystems (e.g., LLMs).

First, we provide a scalable solution for privacy-preserving training when model size scales up. Specifically, we introduce a novel differential privacy (DP) optimization method that eliminates the need for hyperparameter tuning by automatically adapting learning rate schedules in a privatized way. Our approach significantly reduces computational overhead for DP tuning while approaches the performance of non-DP models across multiple tasks, without sacrificing privacy guarantee.

Next, we delve into the risks associated with the pre-training and fine-tuning paradigm for large models. We present PreCurious, a framework that reveals a new privacy attack surface where pre-trained models can be manipulated to increase the privacy risks during the fine-tuning stage. PreCurious demonstrates how attackers can exploit these models to amplify risks like membership inference and data extraction, even under common defenses. PreCurious raises critical concerns about the integrity of community-sourced pre-trained models and highlights the limitations of common privacy defenses.

Bio Ruixuan Liu is a postdoctoral fellow in the Assured Information Management and Sharing (AIMS) group at Emory University’s Computer Science Department, supervised by Prof. Li Xiong. She completed her Ph.D. at Renmin University of China (2018–2023), focusing on privacy-preserving federated learning. Ruixuan has gained valuable research experience as an intern at Amazon AWS AI Lab and Microsoft Research Asia. Her work emphasizes strong privacy guarantees with practical viability, aiming to develop machine learning systems that are highly efficient, inclusive, and utility-driven. Her research has been published in top-tier venues, including CCS, SIGKDD, and AAAI. She has received numerous awards, including Outstanding Doctoral Graduate, the National Scholarship, and 2nd Prize in the CIKM 2022 AnalytiCup Competition.