2025 SCIEN Industry Affiliates Distinguished Poster Award

CompanyPoster

Title: Policy-based Foveated Imaging

Abstract: As image sensor resolution rapidly increases, fixed readout bandwidth imposes a trade-off between spatial detail and frame rate, degrading the performance of video understanding tasks that require both fine detail and low latency. This constraint is sharper on always-on edge devices such as smart glasses that have limited data bandwidth and energy budgets. We address this issue by developing a policy-based, acquisition-time foveated imaging method that allocates higher resolution to task-relevant regions of interest (ROIs) while reducing resolution elsewhere, meeting bandwidth and latency requirements. We present a learning-based ROI-selection policy that integrates with modern video understanding models and is validated on real-world videos captured by ultra-high-resolution sensors.

Authors: Howard Xiao, Gordon Wetzstein

Bio: Howard Xiao is a first year Electrical Engineering PhD student at Stanford University, advised by Prof. Gordon Wetzstein. Previously, he graduated from the University of Toronto, working with Prof. Kyros Kutulakos and Prof. David Lindell. His research focuses on the intersection of computational imaging and machine learning.

Blue River Technology



Title: Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Abstract: We present Buffer Anytime, a framework for estimation of depth and normal maps (which we call geometric buffers) from video that eliminates the need for paired video--depth and video--normal training data. Instead of relying on large-scale annotated video datasets, we demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints. Our zero-shot training strategy combines state-of-the-art image estimation models based on optical flow smoothness through a hybrid loss function, implemented via a lightweight temporal attention architecture. Applied to leading image models like Depth Anything V2 and Marigold-E2E-FT, our approach significantly improves temporal consistency while maintaining accuracy. Experiments show that our method not only outperforms image-based approaches but also achieves results comparable to state-of-the-art video models trained on large-scale paired video datasets, despite using no such paired video data.

Authors: Zhengfei Kuang, Tianyuan Zhang, Kai Zhang, Hao Tan, Sai Bi, Yiwei Hu, Zexiang Xu, Milos Hasan, Gordon Wetzstein, Fujun Luan

Bio: Zhengfei Kuang is a fourth-year Ph.D. student in Computer Science advised by Prof. Gordon Wetzstein. His main research focuses are 3D computer vision, video generative models and neural rendering.
Company Google logoTitle: A Dual-Transducer Large-Aperture Acoustic Camera for Advanced 3D Perception

Abstract: Acoustic time-of-flight (ToF) imaging delivers robust 3D perception with mm-scale resolution in low visibility conditions. We present a 25 cm aperture, dual-transducer acoustic camera that interleaves 128 MEMS microphones (broadband, low gain) with 128 PMUTs (narrowband, high gain) in a shared FoV. The microphones provide mm-scale near-range resolution, while PMUTs extend detection range. We utilize this system in the Airborne Sonar application, which requires detecting laser-induced underwater pings across the air-water interface, which introduces severe attenuation and wavefront distortion. The PMUTs provide the requisite sensitivity to detect the weak underwater signals, while the microphones provide a high resolution surface map, thus informing the channel model in order to correct the wavefront distortions and enable accurate localization of underwater signals..

Authors: William Meng, Megan Zeng, Alexander Suen, Brion Ye, Aidan Fitzpatrick, Ajay Singhvi, Amin Arbabian

Bio: William received his B.S. in Electrical Engineering from Columbia University in 2020. He is currently working toward the Ph.D. degree at Stanford University. His current research interests include acoustic phased arrays, laser optics, and computational imaging techniques. Megan received the B.S. in Electrical Engineering and Computer Sciences at UC Berkeley in 2023, and is currently working towards the Ph.D. degree at Stanford University. Her current research interests are in ultrasound imaging arrays and systems.
Title: Tracking predictive eye movements in natural viewing and walking

Abstract: We investigate how viewing context shapes predictive saccades — eye movements made in anticipation of future information. Participants wore mobile eye-tracking glasses (Neon, Pupil Labs) while either walking through a building or watching a static movie. Using a frame-wise gaze prediction model, we evaluated whether past or future frames better explained gaze selection behavior. Walking elicited a higher level of predictive saccades than static viewing, indicating that human gaze strategies become more forward-looking in natural motion. Our results highlight the importance of incorporating the context when developing predictive models of eye movements.

Authors: Hyunwoo Gu, Jiwon Yeon, Cameron Ellis, Justin Gardner

Bio: Hyunwoo Gu is a fourth-year Ph.D. student at Stanford University in Psychology working with Prof. Justin Gardner. His research examines human vision by combining classical psychophysics and eye-tracking with modern vision-language and diffusion-based models.
Title: Towards compact holographic AR/VR displays with nanophotonic devices

Abstract: Holographic displays utilize diffraction of coherent light to form 3D images in the space, offering a promising path for AR/VR displays. Despite this potential, current holographic displays are constrained by limited etendue and bulky form factors, primarily due to the combiner and light-engine optics. We present two compact holographic display designs enabled by nanophotonic devices called metasurface and photonic integrated circuit (PIC). These emerging optical platforms provide subwavelength-scale wavefront modulation, introducing unprecedented capabilities for wide field-of-view, large eyebox, and ultra-compact holographic displays. We further employ AI-driven wave propagation model, improving the image quality by capturing the characteristics of real-world display systems.

Authors: Seung-Woo Nam, Gun-Yeal Lee, Suyeon Choi, Gordon Wetzstein

Bio: Seung-Woo Nam is a postdoctoral researcher in the Stanford Computational Imaging Lab, working with Prof. Gordon Wetzstein. His research focuses on holography, AR/VR displays, computational imaging, visual perception, and metasurface optics. He received his PhD in Electrical and Computer Engineering from Seoul National University.
Title: Solving Inverse Problems in Imaging with Diffusion Models

Abstract: Diffusion generative priors have advanced reconstruction in ill‑posed imaging by coupling data fidelity with perceptual realism. This poster adopts a unified posterior‑inference view, and organizes recent posterior‑sampling approaches into three paradigms: (i) fine‑tuning, where diffusion models are adapted to specific forward operators (e.g., conditional/latent diffusion); (ii) guidance‑based methods that augment sampling with likelihood gradients or projection steps to enforce measurement consistency; and (iii) inference‑time scaling strategies that calibrate noise schedules, guidance strengths, or solvers to maximize user-specified reward. Within this taxonomy we present Dual Ascent Diffusion (DDiff), a guidance‑based approach that alternates score‑driven updates with principled data‑consistency steps, yielding improved reconstructions over strong baselines in deconvolution, motion deblurring, phase retrieval, etc. We conclude by noting open challenges and opportunities. These chart a path toward trustworthy diffusion‑based reconstruction.

Authors: Sonia Minseo Kim, Gordon Wetzstein

Bio: Sonia Minseo Kim is a 2nd-year MSEE student working at Professor Gordon Wetzstein’s Stanford Computational Imaging Lab.
Title: All-Optical Generative Models

Abstract: Generative artificial intelligence (GenAI) has reached unprecedented capabilities but remains fundamentally constrained by the power and heat dissipation limits of digital electronics. The energy demand of large-scale generative models calls for exploring alternative, domain-specific computing paradigms. Analog optical systems offer an appealing route, as light inherently enables passive, massively parallel, and ultrafast computation with minimal energy loss. However, realizing competitive all-optical generative models remains challenging due to the difficulty of mapping high-dimensional learning dynamics into purely optical operations. In this study, we present all-optical image generation models with user-input guidance and architecture optimized generation paths through programmable diffractive networks. Our results demonstrate that passive optical propagation can approximate modern generative processes with high fidelity and exceptional energy efficiency, providing a realistic pathway toward scalable and sustainable optical AI accelerators.

Authors: Ilker Oguz, Suyeon Choi, Gordon Wetzstein

Bio: Ilker is a Postdoctoral Researcher in Stanford Computational Imaging Laboratory, working on optics-based hardware implementations of modern neural networks and AI architectures. Prior to joining Stanford, he finished his PhD in the School of Photonics at EPFL and MSc in ETH Zurich, Switzerland.