Rethinking Storage System Design for Modern AI Models

Yue Cheng
UVA Data Science & CS

Time: 2026-02-25, 12:00 - 13:00 ET
Location: Rice 540 and Zoom

Abstract Large-scale model hubs (e.g., Hugging Face, Kaggle) and numerous private repositories collectively host millions of pretrained and fine-tuned models, primarily large language models (LLMs). As the de facto infrastructure for AI model development and sharing, these platforms support a vast ecosystem of downstream applications across research and industry. However, their storage footprint has grown explosively—Hugging Face alone hosted over 70 PB of model artifacts in late 2025 and continues to expand exponentially, posing mounting sustainability challenges.

In this talk, I will present a new perspective on sustainable, large-scale modern AI model storage that rethinks system design from the ground up. I will first show how fine-tuned models within a model family exhibit high latent redundancy, enabling storage systems to move beyond generic compression toward model-aware data reduction. I will then talk about why model-level assumptions fundamentally break down and why storage systems must be redesigned around a tensor-centric abstraction that minimizes redundancy at the tensor level. Finally, I will share a vision for a tensor-centric AI ecosystem built around the tensor-centric storage infrastructure.

Bio: Yue Cheng is an associate professor of Data Science and Computer Science at the University of Virginia. His research interests include AI systems, systems for AI, serverless computing, and data and storage systems. His group has built a number of techniques to improve the sustainability, efficiency, and scalability of cloud and AI platforms. Some of his works have led to large-scale deployments and adoptions in public clouds and power the AI applications used by millions every day. He is a recipient of several awards and honors, including an Amazon Research Award (2020), an NSF CAREER Award (2021), a Meta Research Award (2022), the 2022 IEEE CS TCHPC Early Career Researchers Award for Excellence in HPC, and a Samsung GRO Award (2023).