Recognize Anything in 3D with Minimal Human Supervision

Zezhou Cheng
UVA CS

Time: 2024-12-04, 12:00 - 13:00 ET
Location: Rice 540 and Zoom

Abstract Despite rapid progress, the success of computer vision largely remains in 2D vision tasks, such as image classification and 2D object detection, driven by large-scale image datasets and human annotations. However, we live in a 3D world with an infinite variety of object categories. Efficient and accurate 3D recognition remains a significant challenge due to the scarcity of 3D data and the high cost of 3D annotations. In this talk, I will share our recent efforts to address these challenges: (1) a method to learn 3D point cloud representations without any human-created 3D shapes; (2) a framework for recognizing and localizing objects of any category in 3D from a single image; and (3) a comprehensive benchmarking of self-supervised learning methods across diverse 3D vision tasks.

Bio Zezhou Cheng is an Assistant Professor of Computer Science at the University of Virginia. His research interests include computer vision, machine learning, and their applications to ecology, material discovery, VR/AR, and autonomous vehicles. He received awards for his work, such as the Best Synthesis Award in the Computer Science Department at UMass Amherst in 2020 and the Best Poster Award at the New England Computer Vision Workshop in 2019. He has also served on the program committee at major computer vision conferences and was recognized as an Outstanding Reviewer at CVPR 2021.