2023 SCIEN Affiliates Meeting Distinguished Poster Awards

CompanyPoster
AmazonTitle: Synthesize City Walk Videos from Street Maps

Authors: Boyang Deng, Richard Tucker, Zhengqi Li, Leonidas Guibas, Gordon Wetzstein, Noah Snavely

Abstract: Given a map of a city, e.g., London, we aim to generate a street walk video conditioned on a style described by text, e.g. “New York in the Rain”. The viewpoint of each frame can then be controlled by the user according to the map as if walking through the scene. With abundant camera imagery from Google Street View and aerial imagery from Google Earth, existing text-to-image generation approaches can generate high quality single frames for such videos. Yet, making the whole video consistent across different views for large city scenes is still an open problem, particularly when a training set of such videos is absent. In contrast to prior work that focuses on propagating information across different frames, we approach this problem from a novel search perspective. From all the diverse images we can generate for each frame, our algorithm seeks a sequence of images that is multi-view consistent. While the coarse and noisy geometry from the street map and street height map given is insufficient for transforming frames in way that is geometrically accurate, this coarse geometry is very informative in scoring multi-view consistency across frames, which is vital to our search algorithm. Additionally, we derive geometry-aware sampling techniques to accelerate the search process. Our results show that our algorithm generates notably more consistent videos compared to prior video generation methods. Meanwhile, at the cost of imperfect multi-view consistency, our algorithm achieves higher per-frame quality than prior street view reconstruction or generation methods with an actual 3D representation. We also showcase examples of creative video creation using our algorithm.

Bio: Boyang Deng is a second-year PhD student in CS at Stanford, jointly supervised by Prof. Gordon Wetzstein and Prof. Leonidas Guibas. He also works at Google Research as a part-time student researcher. Prior to Stanford, he worked as a research scientist at Waymo Research and Google Brain.
AppleTitle: Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition

Abstract: Capturing images is a key part of automation for high-level tasks such as scene text recognition. Low-light conditions pose a challenge for high-level perception stacks, which are often optimized on well-lit, artifact-free images. Reconstruction methods for low-light images can produce well-lit counterparts, but typically at the cost of high-frequency details critical for downstream tasks. We propose Diffusion in the Dark (DiD), a diffusion model for low-light image reconstruction for text recognition. DiD provides qualitatively competitive reconstructions with that of state-of-the-art (SOTA), while preserving high-frequency details even in extremely noisy, dark conditions. We demonstrate that DiD, without any task-specific optimization, can outperform SOTA low-light methods in low-light text recognition on real images, bolstering the potential of diffusion models to solve ill-posed inverse problems.

Authors: Cindy M. Nguyen, Eric R. Chan, Alexander W. Bergman, Gordon Wetzstein

Bio: Cindy Nguyen is a fifth-year PhD Candidate in the Stanford Computational Imaging Lab. Her interests lie in computational photography and image reconstruction using deep learning and generative AI.
GoogleTitle: PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

Authors: Haley So, Laurie Bose, Piotr Dudek, and Gordon Wetzstein

Abstract: Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing. This is challenging for perception systems operating on edge devices, because communication is power inefficient and induces latency. Fueled by innovations in stacked image sensor fabrication, emerging sensor–processors offer programmability and minimal processing capabilities directly on the sensor. We exploit these capabilities by developing an efficient recurrent neural network architecture, PixelRNN, that encodes spatio-temporal features on the sensor using purely binary operations. PixelRNN reduces the amount of data to be transmitted off the sensor by factors up to 256 compared to the raw sensor data while offering competitive accuracy for hand gesture recognition and lip reading tasks. We experimentally validate PixelRNN using a prototype implementation on the SCAMP-5 sensor–processor platform.

Bio: Haley So is a PhD candidate in Professor Gordon Wetzstein’s Computational Imaging Lab. She is interested in utilizing emerging sensors to rethink imaging algorithms and computer vision tasks.

HYCTitle: Thermal Radiance Fields

Authors: Yvette Lin*, Xin-Yi Pan*, Sara Fridovich-Keil, Gordon Wetzstein

Abstract: Thermal infrared imaging has a variety of applications, from agricultural monitoring to building inspection to imaging under poor visibility, such as in low light, fog and rain. Studying large objects or navigating in complex environments requires combining multiple thermal images into a spatially coherent 3D reconstruction, or radiance field. However, reconstructing infrared scenes poses several challenges due to the comparatively lower resolution, narrower field of view, and lower number of available features present in infrared images. To overcome these challenges, we propose a unified framework for scene reconstruction from a set of uncalibrated infrared and RGB images, using a Neural Radiance Field (NeRF) to represent a scene viewed by both visible and infrared cameras, thus leveraging information across both spectra. We calibrate the RGB and infrared cameras with respect to each other as a preprocessing step using a simple calibration image. We demonstrate our method on real-world sets of RGB and infrared photographs captured from a handheld thermal camera, showing the effectiveness of our method at scene representation across the visible and infrared spectrum.

Bio: Yvette Lin is a master’s student in computer science at Stanford University. Her research interests lie in computational imaging, machine learning, and inverse problems.
Xin-Yi Pan is a master’s student in electrical engineering at Stanford University. Her interests lie in the interaction of physics and computer science, applied to optics and imaging.
Sara Fridovich-Keil is a postdoc at Stanford working with Professors Gordon Wetzstein and Mert Pilanci. She completed her PhD at UC Berkeley where she was advised by Professor Ben Recht.
Intuitive SurgicalTitle: Quantitative AFI (Autofluorescence Imaging) for Oral Cancer Screening

Authors: Xi Mou, Zhenyi Liu, Haomiao Jiang, Brian Wandell, Joyce Farrell

Abstract: Autofluorescence imaging (AFI) is a non-invasive and real-time imaging technique that has proven valuable for early detection and monitoring of oral cancers. In many cases, early detection can extend patient’s lives and improve their quality of life. In the case of oral cancer, AFI visualizes the fluorescence emitted by endogenous tissue fluorophores in the mouth, with no need for exogenous labels or dyes. Today’s AFI devices for detecting oral lesions rely on the subjective judgment of clinicians, which in turn depends on their training and visual abilities. We are designing an autofluorescence imaging system that obtains quantitative imaging data for oral cancer screening. The instrument design is guided by measurements and simulation. We employ an excitation light with peak wavelength, spectral bandwidth and beam angle that maximizes tissue fluorescence without causing tissue damage. Our measurements indicate that the tongue generated autofluorescence signal is about four orders of magnitude less intense than the reflected light from the tongue. To detect this signal we use a camera with a longpass filter that reduces the reflected light reaching a calibrated imaging sensor. The agreement between our simulations and measurements make it possible to quantify tissue fluorescence, evaluate hypotheses about the underlying tissue fluorophores, and potentially develop lab tests for oral lesion diagnosis.

Bio: Xi Mou is a postdoctoral scholar at Stanford University, advised by Prof. Brian Wandell and Dr. Joyce Farrell. Her research focuses on simulation and design of medical imaging devices.

MetaTitle: Efficient Geometry-Aware 3D Generative Adversarial Networks

Authors: Eric Chan*, Connor Lin*, Matthew Chan*, Koki Nagano*, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, Gordon Wetzstein

Abstract: Recent advances in neural rendering and generative models have enabled efficient, near photorealistic generation of 3D objects. In this demonstration, we showcase real-time 3D avatar generation, running off of a consumer PC. Similar generative 3D technology may soon transform how we create 3D objects, avatars, and virtual worlds in the future, with exciting applications in film, video games, the metaverse, and beyond.

Bio: Eric Chan is a Ph.D. student at Stanford where he is currently working with Prof. Gordon Wetzstein’s Computational Imaging group. During his childhood in Oakland, CA, a family full of architects and many years spent in robotics competitions embedded an appreciation for design, robotic locomotion, and spatial understanding. After studying mechanical engineering and computer science at Yale, he began learning the basics of computer vision in the hope of teaching his robots and algorithms how to better understand the world around them. Over the last couple of years, his focus has shifted to the intersection of 3D graphics and vision—to generalization across 3D representations and 3D generative models. Find more at ericryanchan.github.io.
QualcommTitle: Real-Time Hand Keypoint Detection on Edge

Authors: Mohammad Asadi, Haley So, Gordon Wetzstein

Abstract: Real-time 2d hand keypoint detection using RTMPose pipeline and CSPNeXt backbone on MINOTAUR. In this work, we demonstrate quantization and compression of the RTMPose real-time keypoint detection model and its adaptation to MINOTAUR.

Bio: I am a first-year EE PhD student and currently a rotation student at the Computational Imaging Lab. Previously, I have worked on interpretable AI-based feedback systems for education as an intern at the ML4ED laboratory of EPFL University and also on uncertainty estimation of human motion recognition models for autonomous vehicles at the VITA laboratory of EPFL University.
RivianTitle: Physics-based Lens Flare Simulation for Nighttime Driving

Authors: Zhenyi Liu, Devesh Shah, Alireza Rahimpour, Devesh Upadhyay, Joyce Farrell, Brian Wandell

Abstract: Nighttime driving images present unique challenges compared to daytime images, including low-intensity regions and bright light sources causing sensor saturation and lens flare artifacts. These issues impair computer vision models and make image labeling for network training both costly and error-prone. To address this, we developed an end-to-end image system simulation for creating realistic nighttime images. Our approach involves characterizing the simulation system, generating a synthetic nighttime dataset with detailed labels for training, and demonstrating its effectiveness in tasks like flare removal and object detection.

Bio: Zhenyi Liu is a postdoctoral scholar in the Psychology Department at Stanford University, advised by Prof. Brian Wandell and Dr. Joyce Farrell. His research interests focus on physically based imaging system full pipeline simulation for autonomous driving and consumer photography.
SamsungTitle: DRGN-AI: Ab initio reconstruction of heterogeneous structural ensembles

Authors: Axel Levy, Frederic Poitevin, Gordon Wetzstein, Ellen Zhong

Abstract: Proteins and other biomolecules form dynamic macromolecular machines that are tightly orchestrated to move, bind, and perform chemistry. Experimental techniques such as cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET) can access the structure, motion, and interactions of macromolecular complexes. We introduce DRGN-AI, a unified framework for ab initio heterogeneous reconstruction of single particle cryo-EM and cryo-ET subtomogram data. DRGN-AI fuses the flexibility of implicit neural representations with a robust and scalable strategy for pose estimation, we circumvent the need for structural priors or input poses, enabling the discovery of new biological states and previously unresolved molecular motion. For the first time, we demonstrate ab initio heterogeneous subtomogram reconstruction of a cryo-ET dataset. Our method is released as part of the open source cryoDRGN software.

Bio: Axel Levy is a fourth year PhD student in Electrical Engineering at Stanford University. He is advised by Prof. Mike Dunne (director of LCLS at SLAC National Lab) and Prof. Gordon Wetzstein (head of SCI). His research focuses on solving 3D reconstruction problems in unknown-view setups. Most of his work addresses the problem of 3D molecular reconstruction from cryo-electron microscopy images. Prior to his PhD, Axel graduated from the Ecole Polytechnique (France).
SensingTitle: Multifunctional Spaceplates for Chromatic and Spherical Aberration Correction

Authors: Yixuan Shao, Robert Lupoiu, Jiaqi Jiang, You Zhou, Jonathan A. Fan

Abstract: Over the last decade, substantial research endeavors have been devoted to miniaturizing imaging systems. Recently, the invention of a new optical device, known as spaceplates, offers an innovative approach to shrinking the thicknesses of air gaps between lenses in imaging systems. Spaceplates emulate the optical responses of free space within a reduced physical space. A typical design of spaceplates employs optimized thin-film multilayer structures, which provide an extensive range of design freedoms and the possibility of integrating additional functionalities. In this study, we leverage this versatility of spaceplates to assume parts of the aberration correction responsibility traditionally assigned to lenses. We present the design and application of multifunctional spaceplates aimed at rectifying chromatic and spherical aberrations. This approach obviates the need for multiple corrector lenses without expanding the system’s footprint by effectively reusing the air gaps’ space, making spaceplates ideal for compact integrated systems where size constraints are a critical concern such as virtual reality and augmented reality applications.

Bio: Yixuan Shao is a 3rd-year PhD student in electrical engineering. Working with Prof. Jonathan Fan, he is doing research in designing ultra-thin imaging systems with reduced optical aberrations.

SynopsysTitle: Full-Color Metasurface Waveguide Holography

Authors: Manu Gopakumar, Gun-Yeal Lee, Suyeon Choi, Brian Chao, Yifan Peng, Jonghyun Kim, Gordon Wetzstein

Abstract: Recent advances in augmented reality (AR) technology are expected to revolutionize the way digital data is integrated with users’ perception of the real world, opening up new possibilities in various fields such as entertainment, education, communication, and training. Despite its potential, the widespread implementation of AR display technology faces obstacles due to the bulky projection optics of display engines and the challenge of displaying accurate 3D depth cues for virtual content, among other factors. Here, we present an innovative holographic AR display system that addresses these challenges through a unique and synergistic combination of nanophotonic hardware technology and artificial intelligence (AI) software technology. Thanks to a compact metasurface waveguide system and AI-driven holography algorithms, our AR holographic display system delivers high-quality, full-color 3D augmented reality content in a compact device form factor. The core techniques of our work are to use inverse-designed metasurfaces with dispersion-compensating waveguide geometry and an innovative image formation model, taking into account both physical waveguide models and learned components that are automatically calibrated using camera-in-the-loop technology. The groundbreaking integration of nanophotonic metasurface waveguides and AI-based holography algorithms represents a major leap forward in producing immersive 3D augmented reality experiences with a compact wearable device.

Bio: Manu Gopakumar is a PhD student in the Stanford Computational Imaging Lab, and his research interests are centered on using the co-design of optical systems and computational algorithms to build next generation VR and AR headsets. Gun-Yeal Lee is a postdoctoral researcher in the Stanford Computationl Imaging Lab, and his current research focuses on novel optical applications using nanophotonics and metasurface optical elements to develop next-generation display/imaging systems.