Loading Events

« All Events

  • This event has passed.

Dr. Tong Wu (Stanford): “From Objects to Worlds: High-fidelity 3D/4D World Modeling”

March 4 @ 4:30 pm - 5:30 pm

Speaker: Dr. Tong Wu (Stanford

Title: “From Objects to Worlds: High-fidelity 3D/4D World Modeling”

Video: 

Abstract:  Learning to model the 3D/4D world around us is a fundamental problem in computer vision and robotics, with broad impact on applications such as virtual reality, embodied AI, digital twins, and the gaming industry. In this talk, I present our recent progress toward efficient and high-fidelity world modeling. At the object level, we contribute a large-scale realistic dataset to alleviate the scarcity of real-world 3D data. We further develop efficient representations that significantly improve both reconstruction fidelity and computational efficiency. Moving beyond isolated objects, we discuss the challenges of scaling to scene-level, explorable environments, where purely data-driven approaches become increasingly difficult. To address this, we investigate how to leverage strong diffusion priors and lift generative knowledge into explicit 3D representations, enabling consistent and immersive 3D scene synthesis. Furthermore, we step into video generation as a form of world modeling. We introduce approaches that incorporate explicit 3D structure to support long-term static memory in an autoregressive generation process, enabling more stable and persistent world simulation.

Bio:  Tong Wu is a postdoctoral researcher at Stanford University, working with Prof. Gordon Wetzstein. She received her Ph.D. in 2024 from the Multi-Media Laboratory at The Chinese University of Hong Kong, advised by Prof. Dahua Lin. She has also worked closely with Prof. Ziwei Liu at Nanyang Technological University and previously served as a visiting student researcher at Stanford University with Prof. Gordon Wetzstein. She obtained her B.Eng. degree from the Department of Electronic Engineering at Tsinghua University in 2020. Her research interests lie in 3D vision, video generation, and world models.

 

Details