2025 SCIEN Affiliates Meeting Poster Presentations

Index to Posters 

(We are constantly updating this page, so check back later for additions)

Multifunctional Spaceplates for Optical Aberration Correction by Yixuan Shao, Robert Lupoiu, Tianxiang Dai and Jonathan Fan

Geometric neural parameterization for tiled metasurfaces with rotational symmetry and Q factor control by Tianxiang Dai, Yixuan Shao, Chenkai Mao, Yu Wu, Sara Azzouz, You Zhou & Jonathan A. Fan

Pulsatile Brain Motion as a Marker of Brain Aging and Demetia: Insights from 3D quantitative-amplified MRI (q-aMRI) by Itamar Terem, Kyan Younes, Skylar Weiss, Andrew Dreisbach, Hillary Vossler, Eryn Kwon, Daniel Cornfeld, Jet Wright, Paul Condron, Kathleen L Poston, Victor W Henderson, Elizabeth C Mormino, Samantha Holdsworth, and Kawin Setsompop

Towards compact holographic AR/VR displays with nanophotonic devices by Seung-Woo Nam, Gun-Yeal Lee, Suyeon Choi, Gordon Wetzstein

Wave Splatting of 3D Primitives for Computer-generated Holography by Brian Chao, Jackie Yang, Suyeon Choi, Manu Gopakumar

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation by Hansheng Chen , Kai Zhang , Hao Tan , Leonidas Guibas , Gordon Wetzstein , Sai Bi

Policy-based Foveated Imaging by Howard Xiao, Gordon Wetzstein

Image2Garment: Simulation-ready Garments from a Single Image by Selim Emir Can , Jan Ackermann, Kiyohiro Nakayama, RUOFAN LIU, Tong Wu, Yang Zheng, Hugo Bertiche, Menglei Chai, Thabo Beeler, Gordon Wetzstein

Solving Inverse Problems in Imaging with Diffusion Models by Sonia Minseo Kim, Gordon Wetzstein

All-Optical Generative Models by Ilker Oguz, Suyeon Choi, Gordon Wetzstein

Compact computational camera by joint metasurface and network design by Kelsey Lee, Suyeon Choi, Gun-Yeal Lee

AlphaEnsemble: A Multi-Modal Foundation Model for Protein Ensemble Prediction With Raw Supervision on Crystallographic and Cryo-EM Data by Jay Shenoy, Axel Levy, Miro Astore, Frederic Poitevin, Sonya Hanson, Gordon Wetzstein

Ultra-High-Resolution Time-of-Flight Imaging with Free-Space Photoelastic Modulator (PEM) by S. H. Baskaya, A. Arbabian, W. Meng

A Dual-Transducer Large-Aperture Acoustic Camera for Advanced 3D Perception by William Meng, Megan Zeng, Alexander Suen, Brion Ye, Aidan Fitzpatrick, Ajay Singhvi, Amin Arbabian

Tracking predictive eye movements in natural viewing and walking by Hyunwoo Gu, Jiwon Yeon, Cameron Ellis, Justin Gardner

Operando 4D Imaging for Automation of Multimaterial Additive Manufacturing by Elise Yang, Joshua Cheung, Colin Ophus, Eric Darve, Natalie Larson

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization by Yang You, Mikaela Angelina Uy, Jiaqi Han, Rahul Thomas, Haotong Zhang, Yi Du, Hansheng Chen, Francis Engelmann, Suya You, Leonidas Guibas

Deep Learning–Enabled Real-Time Optical Hologram Synthesis and Control for Neutral-Atom Quantum Processors by Yang Xu, Timothy Chang, Nick Gharabaghi, Jenni Solgaard, Joonhee Choi

GPU accelerated atomistic wave-optics simulation toolbox by Dorian P. Luccioni, Leora E. Dresselhaus-Marais

Knowledge-Driven Edge-Cloud Communication Framework by Chae Young Lee, Sara Achour, Zerina Kapetanovic

Ultrafast Acoustic Metrology for Characterizing Defects in Semiconductor Materials by Brinthan Kanesalingam and Leora Dresselhaus-Marais

Enhancing Visual Perception for AMD Patients through Depth-Guided Object and Face Recognition by Raina Song

Self Supervised Deep Priors for Solving Inverse Problems in Electron Microscopy by Arthur R. C. McCray, Cedric Lim, Corneel Casert, Stephanie Ribet, Colin Ophus

Long-Context State-Space Video World Models by Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, Gordon Wetzstein, Xun Huang

Vision Assisted Beamsteering for Full Control of Wireless Power Transfer and Joint Communication Link Budgets by Jasmin Falconer, Geneva Ecola, Zerina Kapetanovic

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors by Zhengfei Kuang, Tianyuan Zhang, Kai Zhang, Hao Tan, Sai Bi, Yiwei Hu, Zexiang Xu, Milos Hasan, Gordon Wetzstein, Fujun Luan

 

Abstracts


Title: Multifunctional Spaceplates for Optical Aberration Correction

Abstract: Spaceplates are nonlocal optical devices with the potential to reduce the form factor of optical systems, but current implementations are limited in performance and ability to correct for optical aberrations. We introduce a new class of multifunctional spaceplates, designed using gradient-based freeform optimization, which exhibit exceptional efficiencies, compression ratios, and aberration correcting capabilities. We show that such spaceplates can serve as correctors for metasurfaces and refractive optical elements, and we demonstrate that multifunctional spaceplates can extend to support multispectral, Petzval field curvature, and spherical aberration correction. We anticipate that these spaceplate concepts will enable the realization of ultracompact optical systems.

Authors: Yixuan Shao, Robert Lupoiu, Tianxiang Dai, Jiaqi Jiang, You Zhou, Jonathan A. Fan

Bio: Yixuan Shao is a 5th-year PhD student working on optical meta-structures. He has years of experience in optical design, simulations, and experiments, and he is currently looking for job opportunities in the industry.


Title: Geometric neural parameterization for tiled metasurfaces with rotational symmetry and Q factor control

Abstract: We present an upgraded Neuroshaper that demonstrates two independent capabilities. The first is symmetry aware tiling with square and hexagon unit cells and rotational symmetry, which enables chiral device layouts over large areas while respecting fabrication constraints. The second is Q factor control that can either target a specified Q or maximize Q using frequency domain gradients and adjoint evaluation. The Q objective is independent of tiling and applies to resonant waveguides metasurfaces and cavities. Together these results showcase the flexibility of the neural level set framework with differentiable penalties for minimum feature size curvature and connectivity as well as multi resolution training and efficient adjoint simulation. We report devices where chiral tiling provides handed response and robust pattern regularity, and devices where Q focused optimization sharpens resonance without violating constraints. The two tracks can be combined but neither requires the other. The poster outlines the algorithms, presents representative results, and shares practical guidance for using Neuroshaper on chiral tiling and on Q factor optimization.

Authors: Tianxiang Dai, Yixuan Shao, Chenkai Mao, Yu Wu, Sara Azzouz, You Zhou & Jonathan A. Fan

Bio: Tianxiang Dai is a PhD candidate in electrical engineering at Stanford University, working at the intersection of computational photonics and inverse design. He designed and implemented the core Neuroshaper algorithm and performed the optical simulations for the study on geometric neural parameterization of freeform nanophotonic devices.


Title: Pulsatile Brain Motion as a Marker of Brain Aging and Demetia: Insights from 3D quantitative-amplified MRI (q-aMRI)

Abstract: Heart–brain interactions, including cardiac-induced brain pulsatility, may play a key role in brain homeostasis, aging, and neurodegeneration. We applied 3D quantitative amplified MRI (q-aMRI) to measure sub-voxel brain motion in 105 participants aged 18–93, including healthy controls and individuals on the Alzheimer’s disease and Lewy body disease spectra. Five expert readers classified motion as normal or abnormal, achieving substantial agreement (Fleiss’ kappa = 0.662). Strain tensor features were used to inform logistic-regression and brain-age prediction models, both trained via repeated cross-validation. The classifier achieved a mean AUC of 99.4%, while the brain-age model reached a mean absolute error of 7.78 years (R² = 0.754; Pearson r = 0.873). Because the features correlated strongly with age and abnormal motion appeared primarily after age 50, we removed age effects and retrained the classifier in older participants, achieving a mean AUC of 84.3%. Principal component analysis (PCA) of strain features highlighted distinct and interpretable biomechanical patterns that distinguished normal from abnormal motion. These signatures provide mechanistic insight into how brain biomechanics change with aging and dementia. Abnormal brain motion was associated with higher odds of a clinical diagnosis based on clinical impression alone (OR = 5.06), in amyloid-positive individuals regardless of clinical impairment stage (OR = 4.20), and in amyloid-positive individuals with a confirmed clinical diagnosis (OR = 9.44), all p < 0.05 (one-sided Fisher’s exact test). These findings suggest that q-aMRI–derived brain motion may serve as a biomarker of brain homeostasis in aging and dementia..

Authors: Itamar Terem, Kyan Younes, Skylar Weiss, Andrew Dreisbach, Hillary Vossler, Eryn Kwon, Daniel Cornfeld, Jet Wright, Paul Condron, Kathleen L Poston, Victor W Henderson, Elizabeth C Mormino, Samantha Holdsworth, and Kawin Setsompop

Bio: Itamar Terem recently completed his PhD in Electrical Engineering at Stanford University, where he was supported by the NSF Graduate Research Fellowship. His work centers on advancing Magnetic Resonance Imaging (MRI) through new computational and acquisition methodologies to characterize pulsatile brain dynamics. His research explores how cardiac-driven tissue motion and cerebrospinal fluid (CSF) flow reflect underlying brain biomechanics and clearance mechanisms, with the goal of developing novel biomarkers for aging and neurological disease.


Title: Towards compact holographic AR/VR displays with nanophotonic devices

Abstract: Holographic displays utilize diffraction of coherent light to form 3D images in the space, offering a promising path for AR/VR displays. Despite this potential, current holographic displays are constrained by limited etendue and bulky form factors, primarily due to the combiner and light-engine optics. We present two compact holographic display designs enabled by nanophotonic devices called metasurface and photonic integrated circuit (PIC). These emerging optical platforms provide subwavelength-scale wavefront modulation, introducing unprecedented capabilities for wide field-of-view, large eyebox, and ultra-compact holographic displays. We further employ AI-driven wave propagation model, improving the image quality by capturing the characteristics of real-world display systems.

Authors: Seung-Woo Nam, Gun-Yeal Lee, Suyeon Choi, Gordon Wetzstein

Bio: Seung-Woo Nam is a postdoctoral researcher in the Stanford Computational Imaging Lab, working with Prof. Gordon Wetzstein. His research focuses on holography, AR/VR displays, computational imaging, visual perception, and metasurface optics. He received his PhD in Electrical and Computer Engineering from Seoul National University.


Title: Wave Splatting of 3D Primitives for Computer-generated Holography

Abstract: Recent advances in neural rendering have introduced Gaussian scene representations that enable photorealistic view synthesis from sparse images. Building on this foundation, we introduce a new wave-optics neural rendering framework that directly transforms these Gaussian primitives into holograms. Our initial formulation, Gaussian Wave Splatting (GWS), derives a closed-form Gaussian-to-hologram transform supporting occlusions and alpha blending, establishing the first bridge between neural scene representations and holographic display synthesis.

Extending this concept, Random-Phase Wave Splatting (RPWS) generalizes GWS to random-phase translucent primitives, such as Gaussians and soft-edged triangles, addressing GWS’s smooth-phase limitations in defocus, parallax, and bandwidth utilization. RPWS introduces a physically grounded wavefront compositing and statistical alpha-blending model that fully leverages the spatial light modulator’s bandwidth, enabling accurate defocus blur, realistic occlusions, and light-field-like parallax.

Together, GWS and RPWS form a unified, scalable framework for wave-splatting-based holography, advancing neural rendering toward photorealistic, compact near-eye holographic displays with natural focus cues.

Authors: Brian Chao, Jackie Yang, Suyeon Choi, Manu Gopakumar

Bio: Brian Chao is a 4th year PhD candidate at Stanford University working in the Stanford Computational Imaging Lab, advised by Prof. Gordon Wetzstein. His research is generously supported by the NSF GRFP and the Stanford Graduate Fellowship.

His research focuses on developing physics-grounded neural rendering algorithms that unify computer graphics, physics, and ML, to enable new capabilities in next-generation AR/VR displays, 3D scene reconstruction, and virtual world generation. He is currently expanding this expertise into generative modeling, focusing on integrating 3D information with video models to improve 3D consistency and efficiency.


Title: pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

Abstract: Few-step diffusion or flow-based generative models typically distill a velocity-predicting teacher into a student that predicts a shortcut towards denoised data. This format mismatch has led to complex distillation procedures that often suffer from a quality-diversity trade-off. To address this, we propose policy-based flow models (pi-Flow). pi-Flow modifies the output layer of a student flow model to predict a network-free policy at one timestep. The policy then produces dynamic flow velocities at future substeps with negligible overhead, enabling fast and accurate ODE integration on these substeps without extra network evaluations. To match the policy’s ODE trajectory to the teacher’s, we introduce a novel imitation distillation approach, which matches the policy’s velocity to the teacher’s along the policy’s trajectory using a standard ell_2 flow matching loss. By simply mimicking the teacher’s behavior, pi-Flow enables stable and scalable training and avoids the quality-diversity trade-off. On ImageNet 256×256, it attains a 1-NFE FID of 2.85, outperforming MeanFlow of the same DiT architecture. On FLUX.1-12B and Qwen-Image-20B at 4 NFEs, pi-Flow achieves substantially better diversity than state-of-the-art few-step methods, while maintaining teacher-level quality.

Authors: Hansheng Chen , Kai Zhang , Hao Tan , Leonidas Guibas , Gordon Wetzstein , Sai Bi

Bio: Hansheng Chen is a 3rd year PhD student from the Stanford Computer Science department, co-advised by Prof. Leonidas Guibas and Prof. Gordon Wetzstein. His research interest lies in the the fundamentals of generative models, with a current focus on diffusion and flow-based models. Previously, he worked on 3D generation and pose estimation, and his work EPro-PnP was awarded the CVPR 2022 Best Student Paper.


Title: Policy-based Foveated Imaging

Abstract: As image sensor resolution rapidly increases, fixed readout bandwidth imposes a trade-off between spatial detail and frame rate, degrading the performance of video understanding tasks that require both fine detail and low latency. This constraint is sharper on always-on edge devices such as smart glasses that have limited data bandwidth and energy budgets. We address this issue by developing a policy-based, acquisition-time foveated imaging method that allocates higher resolution to task-relevant regions of interest (ROIs) while reducing resolution elsewhere, meeting bandwidth and latency requirements. We present a learning-based ROI-selection policy that integrates with modern video understanding models and is validated on real-world videos captured by ultra-high-resolution sensors.

Authors: Howard Xiao, Gordon Wetzstein

Bio: Howard Xiao is a first year Electrical Engineering PhD student at Stanford University, advised by Prof. Gordon Wetzstein. Previously, he graduated from the University of Toronto, working with Prof. Kyros Kutulakos and Prof. David Lindell. His research focuses on the intersection of computational imaging and machine learning.


Title: Image2Garment: Simulation-ready Garments from a Single Image

Abstract: We present Image2Garment, a framework for reconstructing the 3D shape and fabric physical parameters of garments from a single image. Accurately recovering both geometry and material behavior is crucial for realistic virtual try-on, digital avatars, and physics-based simulation, yet remains highly ill-posed due to the scarcity of paired image-to-physics data. We address this challenge by decomposing the task into two stages: first predicting garment materials and fabric attributes from images, and then estimating the physical parameters that describe the mechanical response of fabrics under deformation from those intermediate attributes. Since material composition and fabric metadata of commercial garments are readily available online, this staged formulation enables effective learning from large-scale, weakly paired data. To support this, we curate a dataset that provides aligned 3D garment meshes and fabric parameters, which we will release publicly. Building on these resources, we develop a fast feed-forward model leveraging fine-tuned vision–language representations to infer fabric properties and generate simulation-ready garments from visual input. Image2Garment achieves state-of-the-art accuracy in both single-view and dynamic garment reconstruction compared to prior optimization-based approaches, establishing a new direction toward learning physically grounded garment representations from in-the-wild imagery.

Authors: Selim Emir Can , Jan Ackermann, Kiyohiro Nakayama, RUOFAN LIU, Tong Wu, Yang Zheng, Hugo Bertiche, Menglei Chai, Thabo Beeler, Gordon Wetzstein

Bio: Kiyohiro (George) Nakayama: 1st year PhD student in Computer Science at the Stanford Computational Imaging Group. Selim Emir Can: 1st year MS Student in Electrical Engineering at the Stanford Computational Imaging Group


Title: Solving Inverse Problems in Imaging with Diffusion Models

Abstract: Diffusion generative priors have advanced reconstruction in ill‑posed imaging by coupling data fidelity with perceptual realism. This poster adopts a unified posterior‑inference view, and organizes recent posterior‑sampling approaches into three paradigms: (i) fine‑tuning, where diffusion models are adapted to specific forward operators (e.g., conditional/latent diffusion); (ii) guidance‑based methods that augment sampling with likelihood gradients or projection steps to enforce measurement consistency; and (iii) inference‑time scaling strategies that calibrate noise schedules, guidance strengths, or solvers to maximize user-specified reward. Within this taxonomy we present Dual Ascent Diffusion (DDiff), a guidance‑based approach that alternates score‑driven updates with principled data‑consistency steps, yielding improved reconstructions over strong baselines in deconvolution, motion deblurring, phase retrieval, etc. We conclude by noting open challenges and opportunities. These chart a path toward trustworthy diffusion‑based reconstruction.

Authors: Sonia Minseo Kim, Gordon Wetzstein

Bio: Sonia Minseo Kim is a 2nd-year MSEE student working at Professor Gordon Wetzstein’s Stanford Computational Imaging Lab.


Title: All-Optical Generative Models

Abstract: Generative artificial intelligence (GenAI) has reached unprecedented capabilities but remains fundamentally constrained by the power and heat dissipation limits of digital electronics. The energy demand of large-scale generative models calls for exploring alternative, domain-specific computing paradigms. Analog optical systems offer an appealing route, as light inherently enables passive, massively parallel, and ultrafast computation with minimal energy loss.
However, realizing competitive all-optical generative models remains challenging due to the difficulty of mapping high-dimensional learning dynamics into purely optical operations. In this study, we present all-optical image generation models with user-input guidance and architecture optimized generation paths through programmable diffractive networks. Our results demonstrate that passive optical propagation can approximate modern generative processes with high fidelity and exceptional energy efficiency, providing a realistic pathway toward scalable and sustainable optical AI accelerators.

Authors: Ilker Oguz, Suyeon Choi, Gordon Wetzstein

Bio: Ilker is a Postdoctoral Researcher in Stanford Computational Imaging Laboratory, working on optics-based hardware implementations of modern neural networks and AI architectures. Prior to joining Stanford, he finished his PhD in the School of Photonics at EPFL and MSc in ETH Zurich, Switzerland.


Title: Compact computational camera by joint metasurface and network design

Abstract: Optical computing is emerging as a promising platform for high-speed and energy-efficient information processing, with growing advances in optical hardware enabling light-based computation. Metalenses—flat optical elements composed of subwavelength nanostructures—enable compact and programmable wavefront control, making them a natural candidate for optical–computational integration. We aim to design a compact computational camera that jointly optimizes a metasurface phase layer with a lightweight computational backend to improve inference performance and efficiency. To achieve this, we developed a hybrid ray–wave optical model that captures the spatially varying point-spread function (PSFs) of the metalens in a fully differentiable optimization framework. We demonstrate our approach on an image classification task, co-designing a singlet metalens phase profile and neural inference model. Future work will explore multilayer metasurfaces to expand optical degrees of freedom and support more robust optical–computational imaging systems.

Authors: Kelsey Lee, Suyeon Choi, Gun-Yeal Lee

Bio: Kelsey Lee is a first year PhD student in Electrical Engineering supported by SGF. She has recently graduated from the University of Rochester with a M.S. and B.S. in Optical Engineering and is broadly interested in computational imaging and nano-photonics.


Title: AlphaEnsemble: A Multi-Modal Foundation Model for Protein Ensemble Prediction With Raw Supervision on Crystallographic and Cryo-EM Data

Abstract: A fundamental challenge in modern structural biology is the inability of state-of-the-art foundation models like AlphaFold to capture dynamic conformational ensembles of proteins, as they are exclusively trained on static, refined atomic structures from the PDB. This dependency introduces systemic bias and limits our understanding of molecular function, which is intrinsically dynamic. This project proposes to develop AlphaEnsemble, a novel multi-modal foundation model that overcomes this limitation by shifting its supervisory signal from refined coordinates to raw experimental observables: X-ray crystallographic structure factors and cryo-EM density maps. The methodology involves integrating a specialized ensemble network onto a frozen AlphaFold backbone and training it end-to-end using a differentiable forward model of the experimental capture process. AlphaEnsemble represents the first unified framework that rigorously connects generative structure prediction with physics-constrained inverse modeling. Along with more accurate ensemble prediction, our method will enable real-time ensemble refinement directly from experimental data, which will accelerate analysis during crystallography and cryo-EM experiments and generate high-fidelity structural ensembles to advance fundamental structural science.

Authors: Jay Shenoy, Axel Levy, Miro Astore, Frederic Poitevin, Sonya Hanson, Gordon Wetzstein

Bio: Jay is a fourth-year PhD student in the Computational Imaging Lab advised by Prof. Gordon Wetzstein. His research focuses on AI for science, specifically integrating raw experimental data into the generative modeling process. In the past, he has worked on developing 3D reconstruction methods for protein imaging from massive X-ray datasets.


Title: Ultra-High-Resolution Time-of-Flight Imaging with Free-Space Photoelastic Modulator (PEM)

Abstract: We present a 100-megapixel indirect time-of-flight (iToF) imaging system that leverages amplitude-modulated continuous-wave (AMCW) illumination and a free-space GaAs photoelastic modulator operating at 5.4 MHz to demodulate the returning signal into a low-frequency beat waveform optically. By shifting the correlation process from the electronic to the optical domain, the architecture bypasses the bandwidth and data-throughput limitations of conventional ToF sensors, enabling the use of standard high-resolution CMOS cameras as depth sensors. This approach dramatically reduces system cost while preserving high spatial fidelity. The current prototype achieves 100 MP capture with a depth resolution of 20–25 cm over working distances of 0.5–5 m, and 1–5 cm depth resolution with lab-controlled long averaging. As a software-defined and highly configurable platform, it targets high-fidelity applications including defect inspection, industrial automation, scientific imaging, and outdoor sensing. We demonstrate robust performance on challenging scenarios involving low-reflectivity materials, fine surface textures, and small-scale features at short-medium range, domains traditionally inaccessible to low-resolution depth systems. This work establishes a pathway toward next-generation depth sensing that integrates extreme spatial resolution with photonic modulation, opening opportunities for higher modulation frequencies, improved precision, and real-time adaptation across diverse environments.

Authors: S. H. Baskaya, A. Arbabian, W. Meng

Bio: Saner is a first-year PhD student in Electrical Engineering at Stanford University, working with Professor Amin Arbabian. His research focuses on developing a scalable, software-defined 100-megapixel AMCW-based depth imaging platform capable of rapid adaptation to diverse sensing modalities. His work aims to extend depth sensing into previously inaccessible regimes—particularly at medium to long ranges—by combining extreme spatial resolution, configurable depth precision, and robust performance across a wide range of surface reflectances.

William received his B.S. in Electrical Engineering from Columbia University in 2020. He is currently working toward a Ph.D. degree at Stanford University. His current research interests include acoustic phased arrays, laser optics, and computational imaging techniques.


Title: A Dual-Transducer Large-Aperture Acoustic Camera for Advanced 3D Perception

Abstract: Acoustic time-of-flight (ToF) imaging offers robust 3D perception for robotics, autonomous platforms, and environmental sensing. Compared to other modalities such as optics and RF, acoustic imaging works in low-visibility conditions (eg. fog and smoke) while providing high angular resolution with cost-effective transducers.

We present a large-aperture (25 cm) acoustic camera that combines two complementary sensor types (128 low-gain broadband MEMS microphones and 128 high-gain narrowband PMUTs) into an interleaved array with a shared field of view (FoV). The broadband microphones deliver improved range resolution at short distances, while the PMUTs provide an extended detection range.

As an example application, we demonstrate the utility of our acoustic camera in the Airborne Sonar system, wherein laser-induced underwater pings must be detected in air despite severe attenuation and wavefront distortion from the dynamic water surface. Our dual-transducer architecture provides the high sensitivity required for detecting weak underwater signals, while also providing mm-scale range resolution for precisely mapping the water surface in order to compensate for wavefront distortions.

Authors: William Meng, Megan Zeng, Alexander Suen, Brion Ye, Aidan Fitzpatrick, Ajay Singhvi, Amin Arbabian

Bio: William received his B.S. in Electrical Engineering from Columbia University in 2020. He is currently working toward the Ph.D. degree at Stanford University. His current research interests include acoustic phased arrays, laser optics, and computational imaging techniques.

Megan received the B.S. in Electrical Engineering and Computer Sciences at UC Berkeley in 2023, and is currently working towards the Ph.D. degree at Stanford University. Her current research interests are in ultrasound imaging arrays and systems.


Title: Tracking predictive eye movements in natural viewing and walking

Abstract: We investigate how viewing context shapes predictive saccades — eye movements made in anticipation of future information. Participants wore mobile eye-tracking glasses (Neon, Pupil Labs) while either walking through a building or watching a static movie. Using a frame-wise gaze prediction model, we evaluated whether past or future frames better explained gaze selection behavior. Walking elicited a higher level of predictive saccades than static viewing, indicating that human gaze strategies become more forward-looking in natural motion. Our results highlight the importance of incorporating the context when developing predictive models of eye movements.

Authors: Hyunwoo Gu, Jiwon Yeon, Cameron Ellis, Justin Gardner

Bio: Hyunwoo Gu is a fourth-year Ph.D. student at Stanford University in Psychology working with Prof. Justin Gardner. His research examines human vision by combining classical psychophysics and eye-tracking with modern vision-language and diffusion-based models.


Title: Operando 4D Imaging for Automation of Multimaterial Additive Manufacturing

Abstract: Multimaterial additive manufacturing (AM) is enabling rapid innovation and development of living and multifunctional materials systems with highly customizable and complex designs. However, progress is constrained by our inability to fully visualize and understand the fabrication process in 4D (3D + time). This poster introduces this challenge in the context of multimaterial extrusion-based AM, and proposes a solution: a 4D vision system for real-time feedback control. This would enable automation of calibration processes and development of defect mitigation strategies, enhancing time and material efficiency and accelerating research innovations. The primary aims involve (1) designing a physics-aware, implicit neural representation algorithm for rapid volume reconstruction, (2) creating an optical projection tomography (OPT) system for compact, single-shot 3D print capture, and (3) integrating the reconstruction algorithm with the OPT system to implement closed-loop feedback control for rapid defect detection and correction during multimaterial 3D prints. The research will address the following key opportunities: high-speed implementation for real-time feedback control; few observations due to limited camera space; and varying refractive indices and transparencies for accurate reconstruction of semi-transparent multimaterial features. Together, these capabilities will provide manufacturers and scientists with real-time 4D insight into the printing process, supporting sustainable, efficient, and reliable fabrication of biological and multifunctional materials.

Authors: Elise Yang, Joshua Cheung, Colin Ophus, Eric Darve, Natalie Larson

Bio: Elise Yang is a first-year PhD student in Mechanical Engineering at Stanford University in the Larson Lab (multimaterial additive manufacturing & 4D imaging) and Darve Group (computational & mathematical engineering). She received her BS in Mechanical Engineering with a minor in Computer Science at Columbia University and has interdisciplinary industry and research experiences in real-time data modeling for manufacturing and medical devices. Her research interests include volumetric imaging, design & manufacturing, and computational methods.


Title: Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization

Abstract: Reverse engineering 3D computer-aided design (CAD) models from images is an important task for many downstream applications including interactive editing, manufacturing, architecture, robotics, etc. The difficulty of the task lies in vast representational disparities between the CAD output and the image input. CAD models are precise, programmatic constructs that involve sequential operations combining discrete command structure with continuous attributes, making it challenging to learn and optimize in an end-to-end fashion. Concurrently, input images introduce inherent challenges such as photometric variability and sensor noise, complicating the reverse engineering process. In this work, we introduce a novel approach that conditionally factorizes the task into two sub-problems. First, we leverage vision-language foundation models (VLMs), a finetuned Llama3.2, to predict the global discrete base structure with semantic information. Second, we propose TrAssembler that, conditioned on the discrete structure with semantics, predicts the continuous attribute values. To support the training of our TrAssembler, we further constructed an annotated CAD dataset of common objects from ShapeNet. Putting all together, our approach and data demonstrate significant first steps towards CAD-ifying images in the wild.

Authors: Yang You, Mikaela Angelina Uy, Jiaqi Han, Rahul Thomas, Haotong Zhang, Yi Du, Hansheng Chen, Francis Engelmann, Suya You, Leonidas Guibas

Bio: Yang You is a postdoc at the Geometric Computation group led by Prof. Leonidas Guibas in Stanford. Research interests include 3D graphics, 3D computer vision and robotics.


Title: Deep Learning–Enabled Real-Time Optical Hologram Synthesis and Control for Neutral-Atom Quantum Processors

Abstract: The assembly of large-scale, defect-free neutral atom arrays is a prerequisite for scalable quantum simulation and computing. However, the stochastic loading of atoms requires real-time rearrangement within milliseconds to mitigate vacuum loss. Traditional serial rearrangement methods using Acousto-Optic Deflectors (AODs) scales with system size O(N), becoming prohibitively slow for large arrays. Conversely, parallel holographic methods using Spatial Light Modulators (SLM) have historically been hindered by the slow computation speed of iterative algorithms (e.g., Weighted Gerchberg-Saxton) and their inability to constrain optical phase, leading to destructive interference and atom loss during transport.
We present a deep learning framework to address these computational and physical bottlenecks. We implement a lightweight, supervised Convolutional Neural Network (CNN) that learns the inverse mapping from target trap positions and phases to SLM holograms. Unlike standard iterative solvers, our model enforces explicit phase continuity, ensuring stable trap depths during atom transport. We demonstrate that this architecture achieves high positional accuracy ~30 nm and precise phase fidelity while reducing inference time to ~1 ms on a standard GPU. This approach enables the real-time, parallel O(1) assembly of thousands of atomic qubits, overcoming the scaling limitations of current rearrangement protocols.

Authors: Yang Xu, Timothy Chang, Nick Gharabaghi, Jenni Solgaard, Joonhee Choi

Bio: Yang Xu is a first-year PhD student in the Electrical Engineering department at Stanford University, advised by Prof. Joonhee Choi. Yang is currently building utility-scale neutral-atom quantum processors for both benchmarked analog quantum simulation and fault-tolerant quantum computation.

Title: GPU accelerated atomistic wave-optics simulation toolbox

Abstract: Advances in X-ray sources and detectors have enabled highly sophisticated experiments – from coherent diffraction imaging to dark-field X-ray microscopy (DFXM) – that probe the structure and strain of materials at micro- to nanometer length scales. As these experimental methods grow in complexity, there is a pressing need for equally sophisticated simulation tools that can predict and interpret X-ray scattering and transmission from atomistic models of materials to correctly describe the amplitude and phase contrast produced by an ensemble of atomic scale features. In this work, we present a highly-parallel atomistic wave-optics toolbox capable of simulating the interaction of macroscopic (up to mm scale) atomic structures with X-rays with optical components with arbitrary detector geometries. This level of performance is achieved through highly optimized algorithms designed to take full advantage of the compute capabilities present in modern graphical processing units (GPUs). Such simulation toolboxes allow researchers to connect microstructural features (dislocations, domains, defects) with the resulting X-ray signals, aiding experimental planning and the inversion of complex datasets into real-space information. This work also provides a key component of a future multiscale pipeline connecting finite-element, discrete dislocation dynamics, and atomic structure to X-ray imaging.

Authors: Dorian P. Luccioni, Leora E. Dresselhaus-Marais

Bio: Dorian Luccioni is a 2nd Year PhD student in the Materials Science and Engineering Department at Stanford University, advised by Prof. Leora Dresselhaus-Marais. His work is supported by the Stanford Graduate Fellowship as the Invetec Fellow. His research focuses on the development of diffraction contrast microscopy techniques and their supporting simulations, with a current focus on methods to meet the challenging requirements of shock physics.


Title: Knowledge-Driven Edge-Cloud Communication Framework

Abstract: Today’s mobile and embedded systems, such as smartphones, XR headsets, vehicles, and IoT sensors, generate rich, high-dimensional data that must be transmitted over severely constrained wireless links. Traditional compression and neural codecs, however, treat each signal in isolation, repeatedly spending bits to describe similar content across time and devices. We propose a knowledge-driven edge-cloud communication framework based on communication by reference: instead of always transmitting signals for reconstruction, the edge device and cloud share a vector space and a synchronized table of reference samples. For each new input, the edge either transmits a small index to the nearest neighbor in this table when confident, or falls back to a compact payload that also seeds the table for future reuse. We formalize this retrieval-based channel, derive a bandwidth model for its expected cost, and evaluate it for image transmission. On ImageNet-1k, our approach achieves 0.78 top-1 accuracy at 0.35 kB per sample and 0.83 at 2.83 kB, surpassing ProgJPEG (0.71 at 10 kB) and LimitNet (0.64 at 3.96 kB). A simple random projection further reduces memory footprint by shrinking table size by 3-6× with at most 1.6% absolute accuracy loss, providing a tunable memory-accuracy tradeoff for bandwidth- and energy-constrained edge–cloud communication.

Authors: Chae Young Lee, Sara Achour, Zerina Kapetanovic

Bio: Chaeyoung is a third-year Ph.D. candidate in Computer Science at Stanford University working with Professors Zerina Kapetanovic and Sara Achour. She received her B.S. in Electrical Engineering & Computer Science from Yale in 2023, where she was advised by Professor Lin Zhong. Her research focuses on building algorithmic and system optimization for resource-constrained hardware. She has been recognized as one of the Rising Stars by ACM MobiSys 2025 and received the N2Women Young Researcher Award 2025. Her work has appeared in ACM MobiCom, ACM ASPLOS, ACM MobiSys, and IEEE IROS.


Title: Ultrafast Acoustic Metrology for Characterizing Defects in Semiconductor Materials

Abstract: Picosecond acoustic techniques provide a direct, non-destructive method for probing the elastic and microstructural properties of semiconductor materials. A femtosecond pump pulse launches a broadband coherent acoustic phonon wavepacket, whose propagation is modified by defects through changes in velocity and frequency-dependent attenuation. Dislocations, stacking faults, alloy disorder, doping, and grain boundaries introduce additional scattering modes, producing measurable increases in attenuation and characteristic mode-conversion signatures. By analyzing the time-domain reflectivity and the resulting GHz acoustic spectrum, we can extract defect-sensitive parameters such as elastic softening, damping coefficients, and interface reflection amplitudes. This work outlines the physical basis and measurement framework for using picosecond acoustics as a future metrology tool for assessing semiconductor growth quality, microstructure, and strain.

Authors: Brinthan Kanesalingam and Leora Dresselhaus-Marais

Bio: Brinthan is a second year PhD student in Materials Science and Engineering, primarily focussing on the studying damping and dispersion of Phonons in semiconductor materials.


Title: Enhancing Visual Perception for AMD Patients through Depth-Guided Object and Face Recognition

Abstract: Age related macular degeneration (AMD) is a leading cause of central vision loss worldwide, producing central scotomas that severely impair tasks requiring high acuity vision such as reading, face recognition, and object identification, and forcing patients to rely on peripheral vision with substantially lower spatial resolution that complicates depth perception and object recognition. Head mounted augmented reality (AR) glasses equipped with cameras and displays can partially mitigate these challenges by magnifying or enhancing visual information, but current mainstream AR solutions lack intelligent depth aware processing. Retinal prostheses such as PRIMA restore central vision by converting camera images into optical patterns that stimulate surviving inner retinal neurons via a photovoltaic microchip implanted under the macula, yet the spatial resolution of the resulting percepts is fundamentally constrained by the limited pixel count and size of current implants. To address these limitations, this project introduces a lightweight depth guided pipeline designed to maximize the utility of both PRIMA and AR glasses under strict spatial resolution constraints. The pipeline estimates dense depth maps from RGB input, automatically selects and prioritizes depth based regions of interest such as faces and objects, and applies adaptive super resolution and contrast enhancement within these regions. The enhanced outputs are then evaluated under simulated AMD viewing conditions to assess improvements in visual perception and recognition performance.

Authors: Raina Song

Bio: Raina is a final year master’s student in Electrical Engineering specializing in computer vision, image and signal processing, deep learning, reinforcement learning from human preferences, vision language models, and VR/AR. She previously interned on the Vision Pro team at Apple, working on multimodal human activity classification, object detection, and vision language models, and she is currently researching physics-informed fine-tuning of diffusion models for predicting evoked and spontaneous brain fMRI signals.


Title: Self Supervised Deep Priors for Solving Inverse Problems in Electron Microscopy

Abstract: Inverse problems in electron microscopy are intrinsically ill-posed due to dose limits, noise, incomplete sampling, and other experimental constraints. We present a self-supervised deep-prior framework that couples differentiable physics with implicit neural regularization to address two core tasks: phase retrieval in ptychography and volumetric reconstruction in tomography. In ptychography, CNN-based deep priors parameterize the complex specimen and probe within a mixed-state multislice forward model. This implicit regularization improves information limits at low dose, accelerates convergence (especially at low spatial frequencies), strengthens depth regularization, and largely removes the need for manual tuning. In tomography, we represent the volume with an implicit neural representation and jointly optimize projection poses, enabling fast inline alignment, inpainting of the missing wedge, and denoising from a single dataset without external training data. Across simulated and experimental datasets, our methods yield high-fidelity phase maps and 3D volumes from information-limited measurements. Integrating modern deep priors into physics-based forward models provides training-free reconstructions while reducing expertise barriers and computational cost, and suggests a general recipe for self-supervised inverse modeling across electron microscopy modalities.

Authors: Arthur R. C. McCray, Cedric Lim, Corneel Casert, Stephanie Ribet, Colin Ophus

Bio: Arthur McCray is postdoctoral researcher in the Materials Science and Engineering Department working with Prof. Colin Ophus. His research focuses on solving inverse problems in electron microscopy and materials systems using machine learning. He received his PhD in Applied Physics from Northwestern University advised by Amanda Petford-Long.


Title: Long-Context State-Space Video World Models

Abstract: Video diffusion models have recently shown promise for world modeling through autoregressive frame prediction conditioned on actions. However, they struggle to maintain long-term memory due to the high computational cost associated with processing extended sequences in attention layers. To overcome this limitation, we propose a novel architecture leveraging state-space models (SSMs) to extend temporal memory without compromising computational efficiency. Unlike previous approaches that retrofit SSMs for non-causal vision tasks, our method fully exploits the inherent advantages of SSMs in causal sequence modeling.

Authors: Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, Gordon Wetzstein, Xun Huang

Bio: Po is a fourth-year Electrical Engineering PhD candidate at Stanford University, working in the Stanford Computational Imaging Lab under Prof. Gordon Wetzstein. His research centers on video world models—autoregressive, interactive video generators designed to simulate complex real-world dynamics. Recently, his work has focused on enhancing the long-term memory, stability, and controllability of these models.


Title: Vision Assisted Beamsteering for Full Control of Wireless Power Transfer and Joint Communication Link Budgets

Abstract: Wireless power transfer (WPT) is an increasingly popular area of research for electronic, robotic, and sensing systems. At scale, WPT reduces battery size, enables maintenance-free IoT deployments, and opens the door to simultaneous wireless information and power transfer (SWIPT). While algorithms are effective at targeting stationary objects, it remains challenging to transfer power reliably if objects are moving or their location is not known. As a receiver moves and rotates, path loss, polarization mismatch, and antenna pattern misalignment can cause the harvested power to fluctuate dramatically. We use a vision-assisted WPT system that uses a single camera to estimate the target’s angle, distance, and orientation, and dynamically adjust the beam direction and transmit power to maintain a near-constant power at the receiver. This enables energy-efficient operation in dynamic environments and supports simultaneous SWIPT use cases where a power beacon must also deliver data. Applications include robot-to-robot energy and information exchange, continuous powering of mobile sensing nodes, and vision-guided alignment for charging stations. The key idea is to fuse camera-based estimates of the receive antenna’s location and orientation with beamsteering to enable stable input power for a given target. We have developed initial vision model prototypes including a simple convolutional neural network, YOLO, and RT-DECT algorithms to infer location and distance with the YOLO algorithm performing best with an F1 score of 1 and mean IOU of 0.94.

Authors: Jasmin Falconer, Geneva Ecola, Zerina Kapetanovic 

Bio: Jasmin Falconer and Geneva Ecola are PhD candidates in Zerina Kapetanovic’s S4 Lab. Jasmin is interested in developing smart systems including sensing for robotics, plant health, and women’s health. Geneva is interested in enabling low power wireless sensing and communication systems.


Title: Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Abstract: We present Buffer Anytime, a framework for estimation of depth and normal maps (which we call geometric buffers) from video that eliminates the need for paired video–depth and video–normal training data. Instead of relying on large-scale annotated video datasets, we demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints. Our zero-shot training strategy combines state-of-the-art image estimation models based on optical flow smoothness through a hybrid loss function, implemented via a lightweight temporal attention architecture. Applied to leading image models like Depth Anything V2 and Marigold-E2E-FT, our approach significantly improves temporal consistency while maintaining accuracy. Experiments show that our method not only outperforms image-based approaches but also achieves results comparable to state-of-the-art video models trained on large-scale paired video datasets, despite using no such paired video data.

Authors: Zhengfei Kuang, Tianyuan Zhang, Kai Zhang, Hao Tan, Sai Bi, Yiwei Hu, Zexiang Xu, Milos Hasan, Gordon Wetzstein, Fujun Luan

Bio: JZhengfei Kuang is a fourth-year Ph.D. student in Computer Science advised by Prof. Gordon Wetzstein. His main research focuses are 3D computer vision, video generative models and neural rendering.