SCIEN Industry Affiliates Meeting 2017: Poster Presentations
Posters:
Note: This is a partial list of posters -check back to see updates
Accommodation-invariant Computational Near-eye Displays by Robert Konrad, Nitish Padmanaban, Keenan Molner, Emily A. Cooper, Gordon Wetzstein
Stacked Omnistereo for Virtual Reality with Six Degrees of Freedom by Jayant Thatte and Bernd Girod
Vortex: Live Cinematic Virtual Reality by Robert Konrad, Donald G. Dansereau, Aniq Masood, Gordon Wetzstein
Ray-tracing 3D Spectral Scenes Through Human Optics by Trisha Lian and Brian Wandell
Yarn-Level Cloth Simulation for Predictive Cloth Modeling by Jonathan Leaf, Raj Setalur and, Doug James
VRduino: A Low-cost Head Tracking System for Virtual Reality by Marcus Pan, Keenan Molner and Gordon Wetzstein
Revisiting Recurrent Neural Networks for Video-based Person Re-Identification by Jean-Baptiste Boin and Bernd Girod
Determinants of neural responses to disparity in natural scenes by Yiran Duan, Alexandra Yakovleva and Anthony M. Norcia
Towards high quality image and depth estimation in dual cameras by Rose Rustowicz and Gordon Wetzstein
Dirty Pixels: Optimizing Image Classification Architectures for Raw Sensor Data by Steven Diamond, Vincent Sitzmann, Stephen Boyd, Gordon Wetzstein, Felix Heide
Unrolled Optimization with Deep Priors by Steven Diamond, Vincent Sitzmann, Felix Heide, Gordon Wetzstein
Integrated All-Frequency Sound Synthesis for Computer Animation by Jui-Hsien Wang, Timothy R. Langlois, Ante Qu and Doug L. James
Confocal Non-Line-of-Sight Imaging with the Light Cone Transform by Matthew O’Toole, David B. Lindell, Gordon Wetzstein
Non-Line-of-Sight Imaging from Light Detection and Ranging Specific Single Indirect Response by Chien-Yi Chang, Matthew O’Toole, David B Lindell, Gordon Wetzstein
Deep End-to-End Time-of-Flight Imaging by Shuochen Su, Felix Heide, Wolfgang Heidrich, Gordon Wetzstein
Single-Photon LIDAR with Deep Sensor Fusion by David B. Lindell, Matthew O’Toole, and Gordon Wetzstein
Stimulus dependency across the reading circuitry by Rosemary Le and Brian Wandell
Versatile Neural Modules for Flexible Learning in Embodied Visual Environments by Kevin T. Feigelis, Blue Sheffer, Daniel L.K. Yamins
Predicting Nauseogenicity of Virtual Content via Machine Learning by Nitish Padmanaban, Timon Ruban, Vincent Sitzmann, Anthony M. Norcia, Gordon Wetzstein
Learning Light Field Features by Donald Dansereau, Jayant Thatte, Vincent Sitzmann, Bernd Girod and Gordon Wetzstein
Single-shot speckle correlation fluorescence microscopy in thick scattering tissue with image reconstruction priors by Julie Chang and Gordon Wetzstein
3D Deconvolution Microscopy for Extremely-noisy Images by Hayato Ikoma, Michael Broxton and Gordon Wetzstein
Global representations of goal-directed behavior in distinct cell types of mouse neocortex by W. Allen, I. Kauvar I, M. Chen, E. Richman, S. Yang, K. Chan, V. Gradinaru, B. Deverman, L. Luo, K. Deisseroth
Attenuation-Based 3D Display Using Stacked LCD by Jason Ginsberg and Neil Movva
Abstracts
Title: Stacked Omnistereo for Virtual Reality with Six Degrees of Freedom
Authors: Jayant Thatte and Bernd Girod
Abstract: Most of the live-action virtual reality content available today is captured and rendered from a fixed vantage point disregarding the viewer’s head motion. Lack of motion parallax not only makes the experience less immersive, but also causes the viewer significant discomfort. In this work, we present Stacked OmniStereo, a novel data representation that can render virtual environments to allow six degrees of freedom (6-DoF) viewing. We evaluate our approach with quantitative metrics and subjective examples using natural as well as synthetic scenes. We show that Stacked Omnistereo can synthesize plausible, view-dependent specular highlights, is compact compared to light fields, and significantly outperforms the current state-of-the-art. Additionally, we demonstrate a practical virtual reality rendering system that uses a Stacked OmniStereo intermediary representation to provide a 6-DoF viewing experience utilizing data from a stationary camera rig. The system achieves real-time 6-DoF rendering, following an offline preprocessing stage during which the intermediary representation is constructed from the raw camera images.
Bio:
Jayant Thatte is a 4th year Ph.D. candidate in the Department of Electrical Engineering at Stanford University, advised by Prof. Bernd Girod. His research interests lie at the intersection of image processing and computer graphics with a specific focus on motion parallax in virtual reality systems. He is specifically interested in developing algorithms that address the visual-vestibular and vergence-accommodation conflicts that are present in current virtual reality systems. He received his Bachelor’s and Master’s degrees, both from the Department of Electrical Engineering at Indian Institute of Technology Madras in 2014, where he won the Philips India Award for an outstanding academic record.
Title: Vortex: Live Cinematic Virtual Reality
Authors: Robert Konrad, Donald G. Dansereau, Aniq Masood, Gordon Wetzstein
Abstract: We present Vortex, an architecture for live-streaming 3D virtual reality video. Vortex uses two fast line sensors combined with wide-angle lenses, spinning at up to 300 rpm, to directly capture stereoscopic 360-degree virtual reality video in the widely-used omni-directional stereo (ODS) format. In contrast to existing VR capture systems, no expensive post-processing or complex calibration are required, enabling live streaming of high quality 3D VR content. We capture a variety of example videos showing indoor and outdoor scenes and analyze system design tradeoffs in detail.
Bio: Donald G. Dansereau is a postdoctoral scholar at the Stanford Computational Imaging Lab. His research is focused on computational imaging for robotic vision, and he is the author of the Light Field Toolbox for Matlab. In 2004 he completed an MSc at the University Calgary, receiving the Governor General’s Gold Medal for his pioneering work in light field processing. In 2014 he completed a PhD on underwater robotic vision at the Australian Centre for Field Robotics, University of Sydney. Donald’s industry experience includes physics engines for video games, computer vision for microchip packaging, and FPGA design for automatic test equipment. His field work includes marine archaeology on a Bronze Age city in Greece, hydrothermal vent mapping in the Sea of Crete, habitat monitoring off the coast of Tasmania, and wreck exploration in Lake Geneva.
Title: Ray-tracing 3D Spectral Scenes Through Human Optics
Authors: Trisha Lian and Brian Wandell
Abstract: Display technology design benefits from a quantitative understanding of how parameters of novel displays impact the retinal image. Vision scientists have developed many precise computations and facts that characterize critical steps in vision, particularly at the first stages of light encoding. ISETBIO is an open-source implementation that aims to provide these computations. The initial implementation modeled image formation for distant or planar scenes. Here, we extend ISETBIO by using computer graphics and ray-tracing to model how spectral, three-dimensional scenes are transformed by human optics to the retinal irradiance.
Given a synthetic 3D scene, we trace rays using PBRT (Physically Based Ray-Tracer) through an optical model of the human eye to obtain the spectral irradiance at the retina. The optical model specifies wavelength-dependent index of refraction and surface parameters; these are chosen to match the curvature, size, and asphericity of the cornea, lens, and retina. The methods can implement other eye models, including those with biconic surfaces. The simulation accounts for the chromatic dispersion of light in different ocular media, as well as the effects of accommodation and pupil size.
We compare the retinal irradiance generated from the simulation with experimental measurements from the literature. The sharpness of the computed retinal image matches statistical models. Further, the longitudinal chromatic aberration in our renderings closely matches experimental data.
The ray tracing calculations enable us to understand the impact of different 3D display parameters on the retinal spectral irradiance. This ability may also prove useful for understanding the information available to the visual system to perform critical tasks, such as accommodation and vergence. The simulation tools are available in the ISETBIO Github repository.
Bio: Trisha Lian is an Electrical Engineering PhD student working with Professor Brain Wandell on building simulation tools that include modeling the human visual system and soft-prototyping and evaluating novel imaging systems, such as light field cameras and camera rigs that capture stereo 360 content.
Title: VRduino: A Low-cost Head Tracking System for Virtual Reality
Authors: Marcus Pan, Keenan Molner, Gordon Wetzstein
Abstract:A natural VR viewing experience is only possible with good head tracking. The VR system needs to know where a user is looking instantaneously in order to render the right scene.
We designed a system to robustly track a user’s head movements with low-cost electronics and 3D computer vision algorithms. Our system uses an HTC Vive base station, and a custom PCB with photodiodes, an IMU, and a Teensy microcontroller. The photodiodes capture localization pulses from the base station. The microcontroller reads in all sensor data and computes the position and orientation of the board at a high frequency. All pose computation is done on the microcontroller, and an external PC is only required for visualization.
This system would enable VR enthusiasts with DIY headsets to achieve robust head tracking.
Bio: Marcus is Master’s student in EE working in Prof. Gordon Wetzstein’s Computational Imaging Lab.
Title: Predicting Nauseogenicity of Virtual Content via Machine Learning
Authors: Nitish Padmanaban, Timon Ruban, Vincent Sitzmann, Anthony M. Norcia, Gordon Wetzstein
Abstract: Virtual reality systems are widely believed to be the next major computing platform. There are, however, some barriers to adoption that must be addressed, such as that of motion sickness — which can lead to undesirable symptoms including postural instability, headaches, and nausea. Motion sickness in virtual reality occurs as a result of moving visual stimuli that cause users to perceive self-motion while they remain stationary in the real world. There are several contributing factors to both this perception of motion and the subsequent onset of sickness, including field of view, motion velocity, and stimulus depth. We verify first that differences in vection due to relative stimulus depth remain correlated with sickness. Then, we build a dataset of stereoscopic 3D videos and their corresponding sickness ratings in order to quantify their nauseogenicity, which we make available for future use. Using this dataset, we train a machine learning algorithm on hand-crafted features (quantifying speed, direction, and depth as functions of time) from each video, learning the contributions of these various features to the sickness ratings. Our predictor generally outperforms a naive estimate, but is ultimately limited by the size of the dataset. However, our result is promising and opens the door to future work with more extensive datasets. This and further advances in this space have the potential to alleviate developer and end user concerns about motion sickness in the increasingly commonplace virtual world.
Bio: Nitish Padmanaban is a third year PhD candidate at Stanford EE, advised by Prof. Gordon Wetzstein as part of the Stanford Computational Imaging Lab. His research is focused on optical and computational techniques for virtual and augmented reality, with an emphasis on building and evaluating displays to alleviate the vergence–accommodation conflict, and also the role of vestibular system conflicts in causing motion sickness in VR. Nitish is supported by a NSF Graduate Research Fellowship. He received his Master’s from Stanford EE in 2017, and a Bachelor’s in EECS from UC Berkeley in 2015.
Title: Determinants of neural responses to disparity in natural scenes
Authors: Yiran Duan, Alexandra Yakovleva, Anthony M. Norcia
Abstract: We studied disparity-evoked responses in natural scenes using high density EEG in an event-related design. Thirty natural scenes that mainly included outdoor settings with trees and buildings were used. Twenty-four subjects viewed a series of trials comprised of sequential two-alternative temporal forced choice presentation of two different versions (2D vs. 3D) of the same scene interleaved by a scrambled image with the same power spectrum. Scenes were viewed orthostereoscopically at three meters through a pair of shutter glasses. After each trial, participants indicated with a key press which version of the scene was 3D. Performance on the discrimination was >90%. Participants who were more accurate also tended to respond faster; scenes that were reported more accurately as 3D also led to faster reaction times. We compared Visual Evoked Potentials elicited by scrambled, 2D and 3D scenes using Reliable Component Analysis to reduce dimensionality. The disparity-evoked response to natural scene stimuli, measured from the difference potential between 2D and 3D scenes, comprised a sustained relative negativity in the dominant response component. The magnitude of the disparity specific response was correlated with the observer’s stereoacuity. Scenes with more homogeneous depth maps also tended to elicit large disparity-specific responses. Finally, the magnitude of the disparity specific response was correlated with the magnitude of the differential response between scrambled and 2D scenes, suggesting that monocular higher-order scene statistics modulate disparity-specific responses.
Bio: Yiran is a graduate student in the Department of Psychology working with Prof. Anthony Norcia. Her research focuses on neural substrates of human 3D perception.
Title: Confocal Non-Line-of-Sight Imaging with the Light Cone Transform
Authors: Matthew O’Toole, David B. Lindell, Gordon Wetzstein
Abstract: Imaging objects hidden from a camera’s view is a problem of fundamental importance to many fields of research with applications in robotic vision, defense, remote sensing, medical imaging, and autonomous vehicles. Non-line-of-sight (NLOS) imaging at macroscopic scales has been demonstrated by scanning a visible surface with a pulsed laser and time-resolved detector. Whereas light detection and ranging (LIDAR) systems use such measurements to recover the shape of visible objects from direct reflections, NLOS imaging aims at reconstructing the shape and albedo of hidden objects from multiply scattered light. Despite recent advances, NLOS imaging has remained impractical due to the prohibitive memory and processing requirements of existing reconstruction algorithms, and the extremely weak signal of multiply scattered light. Here we show that confocalizing the scanning procedure provides a means to address these key challenges. Confocal scanning facilitates the derivation of a novel closed-form solution to the NLOS reconstruction problem, which requires orders of magnitude less computation and memory than previous reconstruction methods and recovers hidden objects at unprecedented image resolutions. Confocal scanning also uniquely benefits from a sizeable increase in signal and range when imaging retroreflective objects. We quantify the resolution bounds of NLOS imaging, demonstrate real-time tracking capabilities, and derive efficient algorithms that incorporate image priors and a physically-accurate noise model. Most notably, we demonstrate successful outdoor experiments for NLOS imaging under indirect sunlight.
Bio: Matthew O’Toole is a postdoc in the Stanford Computational Imaging group. He completed his Ph.D. in 2016 with the University of Toronto under the supervision of Prof. Kyros Kutulakos, and his thesis received an “Outstanding Thesis, Honorable Mention” award at SIGGRAPH. He is supported by a Banting Postdoctoral Fellowship from the Government of Canada.
Title: Single-shot speckle correlation fluorescence microscopy in thick scattering tissue with image reconstruction priors
Authors: Julie Chang, Gordon Wetzstein
Abstract: Deep tissue imaging in the multiple scattering regime remains at the frontier of fluorescence microscopy. Speckle correlation imaging (SCI) can computationally uncover objects hidden behind a scattering layer, but has only been demonstrated with scattered laser illumination and in geometries where the scatterer is in the far field of the target object. Here SCI is extended to imaging a planar fluorescent signal at the back surface of a 500-μm-thick slice of mouse brain. The object is reconstructed from a single snapshot via phase retrieval using a proximal algorithm that easily incorporates image priors. Simulations and experiments demonstrate improved image recovery with this approach compared to the conventional SCI algorithm.
Bio: Julie Chang is a Bioengineering PhD student supervised by Gordon Wetzstein working on computational imaging.
Title: Global representations of goal-directed behavior in distinct cell types of mouse neocortex
Authors: W. Allen, I. Kauvar I, M. Chen, E. Richman, S. Yang, K. Chan, V. Gradinaru, B. Deverman, L. Luo, K. Deisseroth
Abstract: The successful planning and execution of adaptive behaviors in mammals may require long-range coordination of neural networks throughout cerebral cortex. The neuronal implementation of signals that could orchestrate cortex-wide activity remains unclear. Here, we develop and apply methods for cortex-wide Ca2+ imaging in mice performing decision-making behavior and identify a global cortical representation of task engagement encoded in the activity dynamics of both single cells and superficial neuropil distributed across the majority of dorsal cortex. The activity of multiple molecularly defined cell types was found to reflect this representation with type-specific dynamics. Focal optogenetic inhibition tiled across cortex revealed a crucial role for frontal cortex in triggering this cortex-wide phenomenon; local inhibition of this region blocked both the cortex-wide response to task-initiating cues and the voluntary behavior. These findings reveal cell-type-specific processes in cortex for globally representing goal-directed behavior and identify a major cortical node that gates the global broadcast of task-related information.
Bio: Isaac Kauvar is a PhD candidate in Electrical Engineering, co-advised by Dr. Gordon Wetzstein and Dr. Karl Deisseroth.
Title: 3D Deconvolution Microscopy for Extremely-noisy Images
Authors: Hayato Ikoma, Michael Broxton and Gordon Wetzstein
Abstract: Fluorescence widefield microscopy is an essential technology in biological sciences to visualize spatial distributions of target molecules. While the images captured with widefiled microscopy tend to have low contrast due to the out-of-focus light, widefield microscopy is superior to other microscopies in terms of light efficiency. Therefore, widefield fluorescence microscopy is still a key technology for live imaging. To further push its capability, we propose an deconvolution algorithm with the Hessian-based regularizer. Our software estimates noise curve from the captured focal-stack images, subsequently applies deconvolution and achieves extreme robustness to noise. We demonstrate that our algorithm successfully deconvolve experimentally-captured fluorescence images whose voxels have less than ten photoelectrons. We also show its superior performance to commercial software in terms of mean squared error.
Bio: Hayato is a PhD student in the Electrical Engineering Department at Stanford University. His research focus is on signal processing and optimization, particularly for image processing and optical microscopy. Before coming to Stanford University, he worked on developing new computational imaging techniques for an optical microscope and a space telescope at MIT Media Lab and Centre de Mathématiques et Leurs Applications at École Normal Supérieure de Cachan (CMLA, ENS Cachan) in France.
Title: Towards High Quality Image and Depth Estimation in Dual Cameras
Authors: Rose Rustowicz and Gordon Wetzstein
Abstract: Stereo-imaging uses data taken from slightly different perspectives to extract depth information, which can also be used to improve image quality. Dual camera modules enable these abilities in imaging systems, recently seen in mobile devices such as Huawei’s Mate 9 or P10 and Apple’s iPhone 7 Plus. We explore current image processing pipelines for RGB image quality and explore dual camera systems for depth estimation. We show depth estimation results using a Huawei mobile phone, and show calibration results for the system.
Bio: Rose Rustowicz is a first year graduate student in Electrical Engineering, where she is focusing on imaging systems and machine learning. Rose currently works on dual camera systems for high quality image and depth estimation. With a background in Imaging Science, she is passionate about all things imaging, especially in information extraction.
Title: Learning Light Field Features
Authors: Donald Dansereau, Jayant Thatte, Vincent Sitzmann, Bernd Girod, Gordon Wetzstein
Abstract: Feature detection and matching are the basis for a broad range of tasks in computer vision: Image registration, pose estimation, 3D reconstruction, place recognition, SfM and SLAM, and many other algorithms in computer vision rely directly on being able to identify and match features across images. While these approaches work relatively robustly over a range of applications, some remain out of reach due to poor performance in challenging conditions. Even infrequent failures can be unacceptable, e.g. in the case of self-driving vehicles. We show and analyze challenging cases where classic 2D feature detectors and descriptors fail, and propose to detect and describe features from 4D Light-Fields to deliver higher-quality detections and descriptors compared with the leading 2D features.
Bio: Vincent Sitzmann is a Ph.D. student in Electrical Engineering at Stanford University, advised by Prof. Gordon Wetzstein. His main research interest is in machine learning for computational imaging and computer vision.
Title: : Stimulus dependency across the reading circuitry
Authors: Rosemary Le and Brian Wandell
Abstract: Neural responses to visual word forms are primarily analyzed in the visual word form area (VWFA). But signals must travel through the early visual areas before they arrive at the VWFA, and in this abstract we consider the early visual areas to be a part of the “reading circuitry”. We report that responses throughout the reading circuitry are stimulus-dependent, and quantify stimulus dependency by fitting population receptive fields (pRFs) to three stimulus types. We first report the distribution of pRFs for word stimuli, and compare these distributions to pRFs that are measured with false font and checkerboard stimuli. Compared to word stimuli, pRFs that are measured with false font stimuli only differ in the VWFA. Compared to checkerboard stimuli, differences are exhibited in the early visual areas in addition to the VWFA. We also analyze and how the pRF distributions change while manipulating language, word size, and bar size. These analyses demonstrate that word-processing and top-down mechanisms have effects not only within the VWFA but also throughout the reading circuitry, and that pRFs can characterize the differing neural responses that are evoked by the stimulus.
Bio: Rosemary is a PhD student in the Department of Psychology working with Professor Brian Wandell. Her research uses neuroimaging to better understand vision and reading.
Title: Accommodation-invariant Computational Near-eye Displays
Authors: Robert Konrad, Nitish Padmanaban, Keenan Molner, Emily A. Cooper, Gordon Wetzstein
Abstract: Although emerging virtual and augmented reality (VR/AR) systems can produce highly immersive experiences, they can also cause visual discomfort, eyestrain, and nausea. One of the sources of these symptoms is a mismatch between vergence and focus cues. In current VR/AR near-eye displays, a stereoscopic image pair drives the vergence state of the human visual system to arbitrary distances, but the accommodation, or focus, state of the eyes is optically driven towards a fixed distance. In this work, we introduce a new display technology, dubbed accommodation-invariant (AI) near-eye displays, to improve the consistency of depth cues in near-eye displays. Rather than producing correct focus cues, AI displays are optically engineered to produce visual stimuli that are invariant to the accommodation state of the eye. The accommodation system can then be driven by stereoscopic cues, and the mismatch between vergence and accommodation state of the eyes is significantly reduced. We validate the principle of operation of AI displays using a prototype display that allows for the accommodation state of users to be measured while they view visual stimuli using multiple different display modes.
Bio: Robert is a 4th year PhD candidate in the Electrical Engineering Department at Stanford University, advised by Professor Gordon Wetzstein as part of the Stanford Computational Imaging Lab. His research interests lie at the intersection of computational displays and human physiology with a specific focus on virtual and augmented reality systems. He has recently worked on relieving vergence-accommodation and visual-vestibular conflicts present in current VR and AR displays, as well a computationally efficient cinematic VR capture system. He received his Bachelor’s Degree from the ECE department at the University of Toronto in 2014, and his Master’s Degree from the EE Department at Stanford University in 2016.
Title: Dirty Pixels: Optimizing Image Classification Architectures for Raw Sensor Data
Authors: Steven Diamond, Vincent Sitzmann, Stephen Boyd, Gordon Wetzstein, Felix Heide
Abstract: Real-world sensors suffer from noise, blur, and other imperfections that make high-level computer vision tasks like scene segmentation, tracking, and scene understanding difficult. Making high-level computer vision networks robust is imperative for real-world applications like autonomous driving, robotics, and surveillance. We propose a novel end-to-end differentiable architecture for joint denoising, deblurring, and classification that makes classification robust to realistic noise and blur. The proposed architecture dramatically improves the accuracy of a classification network in low light and other challenging conditions, outperforming alternative approaches such as retraining the network on noisy and blurry images and preprocessing raw sensor inputs with conventional denoising and deblurring algorithms. The architecture learns denoising and deblurring pipelines optimized for classification whose outputs differ markedly from those of state-of-the-art denoising and deblurring methods, preserving fine detail at the cost of more noise and artifacts. Our results suggest that the best low-level image processing for computer vision is different from existing algorithms designed to produce visually pleasing images. The principles used to design the proposed architecture easily extend to other high-level computer vision tasks and image formation models, providing a general framework for integrating low-level and high-level image processing.
Bios: Steven Diamond is a Ph.D. student in Computer Science at Stanford University, advised by Prof. Stephen Boyd. His main research interests are domain-specific languages for optimization, matrix-free optimization, and computational imaging.
Vincent Sitzmann is a Ph.D. student in Electrical Engineering at Stanford University, advised by Prof. Gordon Wetzstein. His main research interests are computer vision, machine learning and computational imaging.
Felix Heide is a postdoctoral researcher in Prof. Gordon Wetzstein’s group. His main research is in computational imaging and vision systems using large-scale optimization.
Title: Unrolled Optimization with Deep Priors
Authors: Steven Diamond, Vincent Sitzmann, Felix Heide, Gordon Wetzstein
Abstract: A broad class of problems at the core of computational imaging, sensing, and low-level computer vision reduces to the inverse problem of extracting latent images that follow a prior distribution, from measurements taken under a known physical image formation model. Traditionally, hand-crafted priors along with iterative optimization methods have been used to solve such problems. In this paper we present unrolled optimization with deep priors, a principled framework for infusing knowledge of the image formation into deep networks that solve inverse problems in imaging, inspired by classical iterative methods. We show that instances of the framework outperform the state-of-the-art by a substantial margin for a wide variety of imaging problems, such as denoising, deblurring, and compressed sensing magnetic resonance imaging (MRI). Moreover, we conduct experiments that explain how the framework is best used and why it outperforms previous methods.
Bios: Steven Diamond is a Ph.D. student in Computer Science at Stanford University, advised by Prof. Stephen Boyd. His main research interests are domain-specific languages for optimization, matrix-free optimization, and computational imaging.
Vincent Sitzmann is a Ph.D. student in Electrical Engineering at Stanford University, advised by Prof. Gordon Wetzstein. His main research interests are computer vision, machine learning and computational imaging.
Felix Heide is a postdoctoral researcher in Prof. Gordon Wetzstein’s group. His main research is in computational imaging and vision systems using large-scale optimization.
Title: Non-Line-of-Sight Imaging from Light Detection and Ranging Specific Single Indirect Response
Authors: Chien-Yi Chang, Matthew O’Toole, David B Lindell, Gordon Wetzstein
Abstract: Non-line-of-sight (NLOS) imaging uses time-of-flight information of multiply scattered light to reconstruct the shape and albedo of a hidden scene. This paper considers how to efficiently recover the shape of a NLOS scene from the light detection and ranging (LiDAR) specific single indirect response (SIR). We describe an approximate Wiener deconvolution algorithm, the light cone transformation (LCT), to reconstruct the shape of the NLOS scene in a computationally efficient manner. We evaluate the proposed algorithm’s performance against two commonly used methods (back projection and space carving) when they are applied to the sparse SIR data, which are gathered and down sampled from a single photon avalanche diode (SPAD) in a NLOS setup. Results show that the proposed algorithm provides better shape reconstruction and shorter computation time than the back projection and space carving algorithms. NLOS imaging from SIR may have considerably practical value to autonomous driving and endoscopy.
Bio:
Title: Single-Photon LIDAR with Deep Sensor Fusion
Authors: David B. Lindell, Matthew O’Toole, and Gordon Wetzstein
Abstract: Rapid advances in autonomous driving have motivated the development of advanced sensing technologies which can quickly map the surrounding environment. Light detection and ranging (LIDAR) systems in particular have garnered widespread use and attention because of their ability to build up accurate range maps from a set of long distance scans. However, such LIDAR systems yet remain limited because of their relatively long scanning times and comparatively sparse depth maps. To counter this paradigm, we propose a new modality of LIDAR system which enables acquisition of long-range depth images with many dense spatial samples at a rapid framerate. The method relies on a sensor fusion algorithm which combines histograms of photon arrival times with a conventional intensity image using a learned approach with deep convolutional neural networks. We present simulated results demonstrating that a deep sensor fusion approach outperforms conventional range estimation techniques, such as using a matched-filter.
Bio: David Lindell is a 2nd-year Ph.D. student in the Stanford Computational Imaging Lab advised by Gordon Wetzstein. His research interests include time-of-flight sensors, non-line-of-sight imaging, and 3D-imaging.
Title: Versatile Neural Modules for Flexible Learning in Embodied Visual Environments
Authors: Kevin T. Feigelis, Blue Sheffer, Daniel L.K. Yamins
Abstract: Animals (especially humans) have an amazing ability to learn new tasks quickly and switch between them flexibly. How brains support this ability is largely unknown, both neuroscientifically and algorithmically. We demonstrate that a modular learning system embodied inside a touchscreen environment, is able to: 1) learn task-specific decision structures from scratch, 2) use these decision structures to efficiently solve tasks, and 3) rapidly repurpose this knowledge for use on new tasks
Bio:
Title: Integrated All-Frequency Sound Synthesis for Computer Animation
Authors: Jui-Hsien Wang, Timothy R. Langlois, Ante Qu, Doug L. James
Abstract:We have seen great advances in physics-based sound synthesis that allow us to create plausible audio content for phenomena such as rigid-body contacts, fractures, water, and shells. However, the key issue of how the generated sound propagates has only been studied in restricted and sometimes ad hoc settings. In this project, we address this issue by developing a general-purpose time-domain acoustic wavesolver and a series of supporting algorithms that handle dynamic scenes and fully deformable geometries. This solver is based on an efficient framework that abstracts out different sound source models and can support multi-physics sound simulations. We will demonstrate a set of examples that utilize these acoustic shaders and how they can be naturally linked to prior work to create exciting, fully synchronized audio-visual content for computer animation.
Bio: Ante Qu is a second-year Computer Science PhD student working with Professor Doug James on physics-based simulations for computer animation. His interests include computer graphics, sound synthesis, and physics-based numerical simulations.
Title: Yarn-Level Cloth Simulation for Predictive Cloth Modeling
Authors: Jonathan Leaf, Raj Setaluri, Doug James
Abstract: Textiles and clothing account for $800 billion dollars of worldwide exports, but modern design and manufacturing tools for knitted cloth are lacking. Physics-based simulation techniques using yarn-level cloth models can be leveraged to change the paradigm of knitted cloth design and manufacturing. We aim to enable predictive cloth modeling: allowing a user to design and accurately simulate knitted clothing dynamics before manufacturing.
Bio: Jonathan is a 5th year PhD student working with Prof. Doug James. He is interested in physics-based modeling, with a particular focus on knitted cloth mechanics.
Raj is a 2nd year PhD student working with Prof. Doug James, interested in physics based modeling and it’s application to computer graphics and computer sound.
Title: Deep End-to-End Time-of-Flight Imaging
Authors: Shuochen Su, Felix Heide, Wolfgang Heidrich, Gordon Wetzstein
Abstract: We present an end-to-end image processing framework for time-of-flight (ToF) cameras. Existing ToF image processing pipelines consist of a sequence of operations including modulated exposures, denoising, phase unwrapping and multipath interference correction. While this cascaded modular design offers several benefits, such as closed-form solutions and power-efficient processing, it also suffers from error accumulation and information loss as each module can only observe the output from its direct predecessor, resulting in erroneous depth estimates. We depart from a conventional pipeline model and propose a deep convolutional neural network architecture that recovers scene depth directly from dual-frequency, raw ToF correlation measurements. To train this network, we simulate ToF images for a variety of scenes using a time-resolved renderer, devise depth-specific losses, and apply normalization and augmentation strategies to generalize this model to real captures. We demonstrate that the proposed network can efficiently exploit the spatio-temporal structures of ToF frequency measurements, and validate the performance of the joint multipath removal, denoising and phase unwrapping method on a wide range of challenging scenes.
Bio: Felix Heide is a postdoctoral research working at Stanford University. He is interested in the theory and application of computational imaging and vision systems. Researching imaging systems end-to-end, Felix’s work lies at the intersection of optics, machine learning, optimization, computer graphics and computer vision. Felix has co-authored over 25 publications and filed 6 patents. He received his Ph.D. in December 2016 at the University of British Columbia under the advisement of Professor Wolfgang Heidrich. His doctoral dissertation focuses on machine learning for computational imaging and won the Alain Fournier Ph.D. Dissertation Award and the SIGGRAPH outstanding doctoral dissertation award 2016.
Title: Revisiting Recurrent Neural Networks for Video-based Person Re-Identification
Authors: Jean-Baptiste Boin and Bernd Girod
Abstract: Person re-identification consists of associating different tracks of a person as they are captured across a scene. In video re-identification, the goal is to match a video of a person against a gallery of videos captured by different cameras. The challenges come from the variations in background, body pose, illumination and viewpoint.
This task has recently received rising attention due to the high performance achieved by new methods based on deep learning. In particular, in the context of video re-identification, many state-of-the-art works have explored the use of Recurrent Neural Networks to process the sequences. In this work, we revisit the use of this tool and show that simpler network architectures can yield similar performance. Moreover, this simplification allows modifications in the training process that can significantly improve the re-identification performance compared to existing techniques.
Bio: Jean-Baptiste Boin is a PhD candidate in the department of Electrical Engineering at Stanford, advised by Prof. Bernd Girod. His research interests lie at the intersection of image retrieval, computer vision, and augmented/virtual reality. He was funded for two years (2014-2016) by a Magic Grant from the Brown Institute for Media Innovation, for his work as the main developer on the Art++ project, a mobile augmented reality guide for museums that was deployed in Summer 2016 at the Cantor Arts Center.
Title: Attenuation-Based 3D Display Using Stacked LCD
Authors: Jason Ginsberg and Neil Movva
Abstract: Unlike traditional 2D displays, attenuation-based 3D displays enable the accurate, high-resolution depiction of motion parallax, occlusion, translucency, and specularity. We have implemented iterative tomographic reconstruction for image synthesis on a stack of spatial light modulators (multiple low-cost iPad LCDs). We illuminate these volumetric attenuators with a backlight to recreate a 4D target light field. Although five-layer decomposition generates the optimal tomographic reconstruction, our two-layer display costs less than $100 and requires less computation
Bios: Jason Ginsberg is an undergraduate in Electrical Engineering at Stanford University, advised by Gordon Wetzstein. His interests include computational imaging, novel displays, and haptics.
Neil Movva is an undergraduate in Electrical Engineering at Stanford, advised by Christos Kozyrakis. He studies computer architecture with a focus on GPUs and deep learning.