2022 SCIEN Affiliates Meeting Poster Presentations
Index to Posters (check back for updates)
3D quantitative-amplified Magnetic Resonance Imaging (3D q-aMRI) by Itamar Terem, Nan Wang, Kyan Younes, Hillary Vossler, Elizabeth Mormino, Smantha Holdsworth and Kawin Setsompop
Ultralight Night Vision Imaging System Without External Power Supply by Manchen Hu, Emma Belliveau, Natalia Murrietta, Pournima Narayanan, Dan Congreve
Generative Neural Articulated Radiance Fields by Alexander W. Bergman, Petr Kellnhofer, Wang Yifan, Eric R. Chan, David B. Lindell, Gordon Wetzstein
Simulating Dual Pixel Optics in a World of Trade Secrets by Thomas Goossens, Andrey Yatsunenko, Joyce Farrell, Brian Wandell
Learning Controllable Adaptive Simulation for Multi-scale Physics by Tailin Wu*, Takashi Maruyama*, Qingqing Zhao*, Gordon Wetzstein, Jure Leskovec
Learning Spatially Varying Pixel Exposures for Motion Deblurring by Cindy M. Nguyen, Julien N.P. Martel, Gordon Wetzstein
Time-multiplexed Neural Holography: A flexible framework for holographic near-eye displays with fast heavily-quantized spatial light modulators by Suyeon Choi, Manu Gopakumar, Jonghyun Kim, Yifan Peng, Matthew O’Toole, and Gordon Wetzstein
Holographic Glasses for Virtual Reality by Manu Gopakumar, Jonghyun Kim, Suyeon Choi, Yifan Peng, Ward Lopes, Gordon Wetzstein
Vitruvio: 3D Building Meshes via Single Perspective Sketches by Alberto Tono, Martin Fischer
Seeing Far in the Dark with Patterned Flash by Zhanghao Sun, Yicheng Wu, Jian Wang, Shree Nayar
A Photoacoustic Airborne Sonar System by Aidan Fitzpatrick, Ajay Singhvi, Amin Arbabian
Generative Novel View Synthesis with 3D-Aware Diffusion Models by Eric Ryan Chan, Koki Nagano, Jeong Joon Park, Matthew Chan, Alexander William Bergman, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, Gordon Wetzstein
CryoFIRE: Amortized inference for Heterogeneous Reconstruction in Cryo-EM by Axel Levy, Gordon Wetzstein, Julien Martel, Frederic Poitevin, Ellen Zhong
A Focal-plance Code-domain Active Imager for Automotive Applications by Yinuo Xu and Tom Lee
MantissaCam: Learning Snapshot High-dynamic-range Imaging with Perceptually-based In-pixel Irradiance Encoding by Haley M. So, Julien N. P. Martel, Piotr Dudek, Gordon Wetzstein
Crosstalk Elimination by Rearranging Color Filters for Hyperspectral Imaging by Thomas Goossens, Brian Wandell
Abstracts
Title: 3D quantitative-amplified Magnetic Resonance Imaging (3D q-aMRI)
Authors: Itamar Terem, Nan Wang, Kyan Younes, Hillary Vossler, Elizabeth Mormino, Smantha Holdsworth and Kawin Setsompop
Abstract: Amplified Magnetic Resonance Imaging (aMRI) is a pulsatile brain motion visualization method that delivers ‘videos’ with high contrast and temporal resolution. aMRI has been shown to be a promising tool in various neurological disorders. However, aMRI currently lacks the ability to quantify the sub-voxel motion field in physical units. Here, we introduce a novel 3D quantitative aMRI (3D q-aMRI) algorithm, which quantifies the sub-voxel motion of the 3D aMRI signal. 3D q-aMRI is validated on a digital phantom and in-vivo model, which may open up applications in neurological conditions that benefit from understanding altered patterns of brain motion.
Bio: Itamar Terem is a PhD student at the department of Electrical Engineering at Stanford University, and an NSF Graduate Research Fellow. His research focuses on the development of computational and acquisition MRI techniques to explore the cerebrospinal fluid (CSF) dynamic (drivers and motion) through the brain ventricular system, subarachnoid and perivascular space in awake and sleep.
Title: Ultralight Night Vision Imaging System Without External Power Supply
Authors: Manchen Hu, Emma Belliveau, Natalia Murrietta, Pournima Narayanan, Dan Congreve
Abstract: Night vision today is generally enabled by high-voltage image intensifier tubes, which convert photons into an electron cascade that ultimately converts into sufficiently bright visible light via a phosphor screen. This approach requires high external power and a large length of intensifier tubes. Passive, linear upconversion of light from the infrared into the visible holds the powerful potential to revolutionize night vision—reducing the size and weight of night-vision systems and obviating the need for external power. In this work, combining the innovative designs of semiconductor materials and optoelectronic device structures, we successfully demonstrated the prototype of an ultralight night vision imaging system that enables sight over a broadband range in the near-infrared (NIR). This system, without any power supply, can convert NIR into visible light/images that can be seen by the naked eyes.
Bio: Manchen Hu is a Ph.D. candidate in the Department of Electrical Engineering at Stanford University. Machen’s research in Professor Dan Congreve’s Lab deals with thin-film optoelectronic devices that can upconvert infrared photons into visible photons. He is interested in light-matter interactions and works on the combination of optics, electronics, and materials to enable novel functionality of the devices.
Title: Generative Neural Articulated Radiance Fields
Authors: Alexander W. Bergman, Petr Kellnhofer, Wang Yifan, Eric R. Chan, David B. Lindell, Gordon Wetzstein
Abstract: Unsupervised learning of 3D-aware generative adversarial networks (GANs) using only collections of single-view 2D photographs has very recently made much progress. These 3D GANs, however, have not been demonstrated for human bodies and the generated radiance fields of existing frameworks are not directly editable, limiting their applicability in downstream tasks. We propose a solution to these challenges by developing a 3D GAN framework that learns to generate radiance fields of human bodies or faces in a canonical pose and warp them using an explicit deformation field into a desired body pose or facial expression. Using our framework, we demonstrate the first high-quality radiance field generation results for human bodies. Moreover, we show that our deformation-aware training procedure significantly improves the quality of generated bodies or faces when editing their poses or facial expressions compared to a 3D GAN that is not trained with explicit deformations.
Bio: Alexander W. Bergman is a fifth year PhD student in the Stanford Computational Imaging Lab. His research interests include neural rendering and 3D imaging.
Title: Simulating Dual Pixel Optics in a World of Trade Secrets
Authors: Thomas Goossens, Andrey Yatsunenko, Brian Wandell, Joyce Farrell
Abstract: Designing image systems is a complex task, and the process can benefit from simulation tools that model all of the system components. A significant challenge in simulating an imaging system is obtaining accurate models of all the components. Understandably, manufacturers may want to protect their intellectual property, making them reluctant to share details that would be important to an accurate simulation. In this talk, I will discuss how we used black-box (phenomenological) models to simulate a consumer camera with a proprietary lens design and proprietary pixel optics (microlens and dual pixel). These models can be used for evaluating a system even before building a physical prototype. Furthermore, an accurate simulator can be used to generate a large number of synthetic images which can be used to train or test neural networks.
Bio: Thomas Goossens is a postdoctoral fellow at Stanford University working on camera simulation. He obtained his P.h.D. at Ku Leuven in Belgium in collaboration with IMEC, working on hyperspectral imaging.
Title: Learning Controllable Adaptive Simulation for Multi-scale Physics
Authors: Tailin Wu*, Takashi Maruyama*, Qingqing Zhao*, Gordon Wetzstein, Jure Leskovec
Abstract: Simulating the time evolution of physical systems is pivotal in many scientific and engineering problems. An open challenge in simulating such systems is their multi-scale dynamics: a small fraction of the system is extremely dynamic, and requires very fine-grained resolution, while a majority of the system is changing slowly and can be modeled by coarser spatial scales. Typical learning-based surrogate models use a uniform spatial scale, which needs to resolve to the finest required scale and can waste a huge compute to achieve required accuracy. In this work, we introduce Learning controllable Adaptive simulation for Multi-scale Physics (LAMP) as the first full deep learning-based surrogate model that jointly learns the evolution model and optimizes appropriate spatial resolutions that devote more compute to the highly dynamic regions. LAMP consists of a Graph Neural Network (GNN) for learning the forward evolution, and a GNN-based actor-critic for learning the policy of spatial refinement and coarsening. We introduce learning techniques that optimizes LAMP with weighted sum of error and computational cost as objective, which allows LAMP to adapt to varying relative importance of error vs. computation tradeoff at inference time. We test our method in a 1D benchmark of nonlinear PDEs and a challenging 2D mesh-based simulation. We demonstrate that our LAMP outperforms state-of-the-art deep learning surrogate models with up to 39.3% error reduction, and is able to adaptively trade-off computation to improve long-term prediction error.
Bio: Qingqing Zhao is a third-year Ph.D. student in the Stanford Computational Imaging Lab. She is interested in ML for physics simulations and inverse problems.
Title: Learning Spatially Varying Pixel Exposures for Motion Deblurring
Authors: Cindy M. Nguyen, Julien N.P. Martel, Gordon Wetzstein
Abstract: Computationally removing the motion blur introduced by camera shake or object motion in a captured image remains a challenging task in computational photography. Deblurring methods are often limited by the fixed global exposure time of the image capture process. The post-processing algorithm either must deblur a longer exposure that contains relatively little noise or denoise a short exposure that intentionally removes the opportunity for blur at the cost of increased noise. We present a novel approach of leveraging spatially varying pixel exposures for motion deblurring using next-generation focal-plane sensor–processors along with an end-to-end design of these exposures and a machine learning–based motion-deblurring framework. We demonstrate in simulation and a physical prototype that learned spatially varying pixel exposures (L-SVPE) can successfully deblur scenes while recovering high frequency detail.
Bio: Cindy Nguyen is a fourth-year PhD student, advised by Gordon Wetzstein in the Stanford Computational Imaging lab. Her background is in task-specific end-to-end camera design, including systems for single-shot monocular depth estimation and motion deblurring. She is interested in imaging problems around depth estimation, deblurring, denoising, and HDR.
Title: Time-multiplexed Neural Holography: A flexible framework for holographic near-eye displays with fast heavily-quantized spatial light modulators
Authors: Suyeon Choi, Manu Gopakumar, Jonghyun Kim, Yifan Peng, Matthew O’Toole, and Gordon Wetzstein
Abstract: Holographic near-eye displays offer unprecedented capabilities for virtual and augmented reality systems, including perceptually important focus cues. Although artificial intelligence–driven algorithms for computer-generated holography (CGH) have recently made much progress in improving the image quality and synthesis efficiency of holograms, these algorithms are not directly applicable to emerging phase-only spatial light modulators (SLM) that are extremely fast but offer phase control with very limited precision. The speed of these SLMs offers time multiplexing capabilities, essentially enabling partially-coherent holographic display modes. Here we report advances in camera-calibrated wave propagation models for these types of near-eye holographic displays and we develop a CGH framework that robustly optimizes the heavily quantized phase patterns of fast SLMs. Our framework is flexible in supporting runtime supervision with different types of content, including 2D and 2.5D RGBD images, 3D focal stacks, and 4D light fields. Using our framework, we demonstrate state-of-the-art results for all of these scenarios in simulation and experiment.
Bio: Suyeon Choi is a third-year PhD student working as a part of Stanford Computational Imaging Lab, advised by Prof. Gordon Wetzstein. He is generally interested in developing 3D display hardware systems with novel algorithmic frameworks. Lately, he has been developing holographic display systems incorporating machine learning toward next-generation VR/AR displays. His research has been partly supported by a Meta Research PhD Fellowship, a Kwanjeong Scholarship, a Korean Government Scholarship, and a GPU gift from NVIDIA.
Title: Holographic Glasses for Virtual Reality
Authors: Manu Gopakumar, Jonghyun Kim, Suyeon Choi, Yifan Peng, Ward Lopes, Gordon Wetzstein
Abstract: We present Holographic Glasses, a holographic near-eye display system with an eyeglasses-like form factor for virtual reality. Holographic Glasses are composed of a pupil-replicating waveguide, a spatial light modulator, and a geometric phase lens to create holographic images in a lightweight and thin form factor. The proposed design can deliver full-color 3D holographic images using an optical stack of 2.5 mm thickness. A novel Pupil-high-order gradient descent algorithm is presented for the correct phase calculation with the user’s varying pupil size. We implement benchtop and wearable prototypes for testing. Our binocular wearable prototype provides a diagonal field of view of 22.8°, a 2.3 mm static and 8 mm dynamic eye box, and it supports 3D focus cues, weighing only 60 𝑔, excluding the driving board.
Bio: Manu Gopakumar is a PhD student in the Department of Electrical Engineering at Stanford University. His research interests are centered on the co-design of optical systems and computational algorithms. More specifically, his current research focuses on utilizing novel computational algorithms to enable higher quality 3D holography and more compact form-factors for holographic displays. He received his bachelor’s and master’s degree from Carnegie Mellon University.
Title: Vitruvio: 3D Building Meshes via Single Perspective Sketches
Authors: Alberto Tono, Martin Fischer
Abstract: Today’s architectural engineering and construction (AEC) software require a steep learning curve to generate a three-dimension building representation. This limits the ability to quickly validate the volumetric implications of an initial design idea communicated via a single sketch. Allowing designers to translate a single sketch to a 3D building will enable owners to instantly visualize 3D project information without the cognitive load required. If previous state-of-the-art (SOTA) methods for single view reconstruction (SVR) showed outstanding results in data-driven reconstruction processes from a single image or sketch, they lacked specific applications, analysis, and experiments in the AEC. Therefore, this research addresses this gap, introducing a deep learning method: Vitruvio. Vitruvio adapts Occupancy Network for SVR tasks on a specific building dataset (Manhattan 1K). This adaptation brings two main improvements. First, it accelerates the inference process by more than 26\% (from 0.5s to 0.37s). Second, it increases the reconstruction accuracy (measured by the Chamfer Distance) by 18\%. During this adaptation in the AEC domain, we evaluate the effect of the building orientation in the learning procedure since it constitutes an important design factor. While aligning all the buildings to a canonical pose improved the overall quantitative metrics, it did not capture fine-grain details in more complex building shapes (as shown in our qualitative analysis). Finally, Vitruvio outputs a 3D-printable building mesh with arbitrary topology and genus from a single perspective sketch, providing a step forward to allow owners and designers to communicate 3D information via a 2D, effective, intuitive, and universal communication medium: the sketch.
Bio: Tono Alberto is a current PhD Student at Stanford University under the supervision of Kumagai Professor Martin Fischer. Furthermore, as president of the Computational Design Institute, he is exploring ways in which the Convergence between Digital and Humanities can facilitate cross-pollination between different industries within an Ethical Framework. He served as the Research and Computational Design Leader in Architectural and Engineering organizations, receiving the O1-visa for outstanding abilities with both HOK and HDR. Tono obtained his Masters in Building Engineering – Architecture from the University of Padua and the Harbin Institute of Technology. He has been working in the computational design and deep learning space since 2014. Furthermore, he is improving Building Information Modeling and Virtual Design and Construction (BIM/VDC) workflows within a statistical framework to optimize the sustainability impact of these processes. Hence, Tono is LEED AP certified. He is an international multi-award-winning “hacker” and speaker, and his work within Architecture and Artificial Intelligence brought him to companies in China, the Netherlands, Italy, and California. Thanks to his multidisciplinary approach he worked as Data Scientist and Geometric Deep Learning Researcher at a Physna/Thangs helping to raise over 80 Milion while working on 3D Search and Monocular 3D Shape Retrieval problems. He is devoting his life for Hannah Tono’s happiness. Since they b oth are passionate about new technologies, he developed an augmented reality wedding proposal. Together they run a program called Dreamship to help Pediatric Palliative Care Hospices adopt immersive technologies through research.
Title: Seeing Far in the Dark with Patterned Flash
Authors: Zhanghao Sun, Yicheng Wu, Jian Wang, Shree Nayar
Abstract: Flash illumination is widely used in imaging under low-light environments. However, illumination intensity falls off with propagation distance quadratically, which poses significant challenges for flash imaging at a long distance. We propose a new flash technique, named “patterned flash”, for flash imaging at a long distance. Patterned flash concentrates optical power into a dot array. Compared with the conventional uniform flash where the signal is overwhelmed by the noise everywhere, patterned flash provides stronger signals at sparsely distributed points across the field of view to ensure the signals at those points stand out from the sensor noise. This enables post-processing to resolve important objects and details. Additionally, the patterned flash projects texture onto the scene, which can be treated as a structured light system for depth perception. Given the novel system, we develop a joint image reconstruction and depth estimation algorithm with a convolutional neural network. We build a hardware prototype and test the proposed flash technique on various scenes. The experimental results demonstrate that our patterned flash has significantly better performance at long distances in low-light environments.
Bio: Zhanghao Sun is a 5th year PhD student at Electrical Engineering, Stanford University. He works mainly in the field of computational imaging hardware and algorithms. He’s advised by Prof. Olav Solgaard, and co-advised by Prof. Gordon Wetzstein.
Title: A Photoacoustic Airborne Sonar System
Authors: Aidan Fitzpatrick, Ajay Singhvi, Amin Arbabian
Abstract: Sonar imaging allows for exploration of areas of the ocean that are not readily accessible for direct observation albeit at a slow rate. To overcome the limitations of sonar, there is a push to develop an airborne system that can increase the speed and thus spatial coverage of underwater imaging. We present a system concept which maintains the advantages of conventional in-water sonar while operating entirely from an airborne platform. The proposed system translates airborne optical excitation into an underwater acoustic source through the laser-induced photoacoustic effect and employs air-coupled ultrasound transducers to detect acoustic echoes from the underwater scene. By combining the unique advantages of light and sound, our system could make oceans transparent and enable large-scale, high-throughput underwater sensing.
Bio: Aidan Fitzpatrick received the B.S. degree in electrical and computer engineering from the University of Massachusetts Amherst in 2018, where he performed research on antenna design and RF system design, and the M.S. degree in electrical engineering from Stanford University in 2020, where he is currently pursuing the Ph.D. degree in electrical engineering. His research interests are in computational imaging and perception systems—specifically at the intersection of electromagnetics, acoustics, and signal processing for the co-design of imaging algorithms and system hardware. His current projects focus on remote sensing applications of non-contact thermoacoustic/photoacoustic imaging.
Title: Generative Novel View Synthesis with 3D-Aware Diffusion Models
Authors: Eric Ryan Chan, Koki Nagano, Jeong Joon Park, Matthew Chan, Alexander William Bergman, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, Gordon Wetzstein
Abstract: We present a diffusion-based model for 3D-aware generative novel view synthesis from one or more input images. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambiguity, is capable of rendering diverse, high-fidelity, and plausible novel views. To achieve this, our method makes use of existing 2D diffusion backbones but, crucially, we incorporate geometry priors in the form of a 3D feature volume. This latent feature field captures the distribution over possible scene representations and improves our method’s ability to generate view-consistent novel renderings. In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent images. We demonstrate state-of-the-art results on synthetic renderings and room-scale scenes; we also show compelling results for challenging, real-world objects.
Bio: I’m Eric, a first-year Ph.D. student at Stanford where I’m advised by Gordon Wetzstein and Jiajun Wu. After studying mechanical engineering and computer science at Yale, I began learning the basics of computer vision in the hope of teaching robots and algorithms how to better understand the world around them. Over the last couple of years, my focus has shifted to the intersection of 3D graphics and vision—to generalization across 3D representations and 3D generative models. Find more at ericryanchan.github.io
Title: CryoFIRE: Amortized inference for Heterogeneous Reconstruction in Cryo-EM
Authors: Axel Levy, Gordon Wetzstein, Julien Martel, Frederic Poitevin, Ellen Zhong
Abstract: Cryo-electron microscopy (cryo-EM) is an imaging modality that provides unique insights into the dynamics of proteins and other building blocks of life. The algorithmic challenge of jointly estimating the poses, 3D structure, and conformational heterogeneity of a biomolecule from millions of noisy and randomly oriented 2D projections in a computationally efficient manner, however, remains unsolved. Our method, cryoFIRE, performs ab initio heterogeneous reconstruction with unknown poses in an amortized framework, thereby avoiding the computationally expensive step of pose search while enabling the analysis of conformational heterogeneity. Poses and conformation are jointly estimated by an encoder while a physics-based decoder aggregates the images into an implicit neural representation of the con- formational space. We show that our method can provide one order of magnitude speedup on datasets containing millions of images without any loss of accuracy. We validate that the joint estimation of poses and conformations can be amortized over the size of the dataset. For the first time, we prove that an amortized method can extract interpretable dynamic information from experimental datasets.
Bio: Axel Levy is a third year PhD student in the EE department. He is jointly supervised by Pr. Mike Dunne and Pr. Gordon Wetzstein. His research focuses on the application of inverse rendering methods to scientific imaging (eg. inferring the 3D structure of proteins from cryo-EM images), and more generally in solving inverse problems that arise in physics and biology.
Title: A Focal-plance Code-domain Active Imager for Automotive Applications
Authors: Yinuo Xu and Tom Lee
Abstract: Advanced driver-assistance systems demand low-cost, high spatial resolution mm-wave sensors. To achieve sub-degree angular resolution, a large array requires a simple front end design for more efficient scaling of array elements. In this poster, we present a focal plane mm-wave imager in a fully scalable architecture. The receiver consists of a plastic flat lens for passive beamforming an integrated imaging array with pixel level compute and memory.
Bio: Yinuo Xu is a PhD student in the EE department, supervised by Pro. Tom Lee. her research focuses on building the new imaging system for automotive applications.
Title: MantissaCam: Learning Snapshot High-dynamic-range Imaging with Perceptually-based In-pixel Irradiance Encoding
Authors: Haley M. So, Julien N. P. Martel, Piotr Dudek, Gordon Wetzstein
Abstract: The ability to image high-dynamic-range (HDR) scenes is crucial in many computer vision applications. The dynamic range of conventional sensors, however, is fundamentally limited by their well capacity, resulting in saturation of bright scene parts. To overcome this limitation, emerging sensors offer in-pixel processing capabilities to encode the incident irradiance. Among the most promising encoding schemes is modulo wrapping, which results in a computational photography problem where the HDR scene is computed by an irradiance unwrapping algorithm from the wrapped low-dynamic-range (LDR) sensor image. Here, we design a neural network-based algorithm that outperforms previous irradiance unwrapping methods and we design a perceptually inspired “mantissa,” or log-modulo, encoding scheme that more efficiently wraps an HDR scene into an LDR sensor. Combined with our reconstruction framework, MantissaCam achieves state-of-the-art results among modulo-type snapshot HDR imaging approaches. We demonstrate the efficacy of our method in simulation and show benefits of our algorithm on modulo images captured with a prototype implemented with a programmable sensor.
Bio: Haley So is a 3rd year PhD student researching in Professor Gordon Wetzstein’s Computational Imaging Lab. She is interested in the co-design of hardware and software, particularly in utilizing emerging sensors to rethink imaging algorithms and computer vision tasks.
Title: Crosstalk Elimination by Rearranging Color Filters for Hyperspectral Imaging
Authors: Thomas Goossens, Brian Wandell
Abstract: Patterned thin-film Fabry–Pérot filters are used to develop compact hyperspectral cameras. Recent articles report on crosstalk in such devices, raising concerns regarding spectral and spatial resolution. The proposed mechanism in this Poster is that the Fabry–Pérot filters act as coupled waveguides that can propagate crosstalk above the pixel array. The results show that the crosstalk can be asymmetrical. This enables its elimination by simply rearranging the filters on the sensor. These findings reveal untapped opportunities for developing better sensors and a corresponding need for further systematic investigations.
Bio: Thomas Goossens is a postdoctoral fellow at Stanford University working on camera simulation. He obtained his P.h.D. at Ku Leuven in Belgium in collaboration with IMEC, working on hyperspectral imaging.