SCIEN Affiliates Meeting Poster Presentations 2015

Posters

Image Capture:

Diffraction-aware Light Field Photography: Julie Chang, Xuemei Hu, Isaac Kauvar, Gordon Wetzstein (Qualcomm Award)

Color Architecture in Light Field Cameras: Trisha Lian, Brian Wandell (Google Award)

Doppler Time-of-Flight Imaging: Shikhar Shrestha, Felix Heide, Andrew Thomas Naber, Gordon Wetzstein (Meta Award)

3D bladder reconstruction from white light cystoscopy videos: Kristen L. Lurie, Dimitar V. Zlatev, Joseph C. Liao, Roland Angst, Audrey K. Bowden

Toward 3D Localization Microscopy With Convolutional Sparse Model: Hayato Ikoma, Gordon Wetzstein

Dielectric gradient metasurface optical elements: Dianmin Lin, Pengyu Fan, Erez Hasman, Mark L. Brongersma

Multisite two-photon three-dimensional random access in vivo calcium imaging: Samuel J. Yang, William E. Allen, Isaac V. Kauvar, Aaron Andalman, Noah Young, Christina K. Kim, Gordon Wetzstein, Karl Deisseroth

A Bayesian Approach for Single Photon Pile-up Correction: Rafael Setra, Felix Heide, Shikhar Shrestha, Mathew O’Toole, Gordon Wetzstein

Contrast-Enhanced Optical Coherence Tomography With Picomolar Sensitivity for Functional in-vivo Imaging: Orly Liba, Elliott SoRelle, Adam de la Zerda

A CMOS Vision Sensor for Always-On, Energy Constrained Applications: Chris Young, Alex Omid-Zohoor, Boris Murmann

Simulation of the effects of sensor and environmental parameters on image classification: Garikoitz Lerma-Usabiaga, Brian Wandell

 

Image Systems Architecture and Processing:

Frankencamera 4: A heterogenous platform for image processing: Steven Bell, Jing Pu, Mark Horowitz

A novel image processing pipeline for mobile devices: Local, Linear and Learned (L3) pipeline: Qiyuan Tian, Haomiao Jiang, Steven Lansel, Joyce Farrell, Brian Wandell (Google Award)

Local Linear Approximation for Camera Image Processing Pipelines: Haomiao Jiang, Joyce Farrell, Brian Wandell (GoPro Award)

Simultaneous Estimation of Surface Reflectance and Fluorescence: Henryk Blasinski, Joyce Farrell, Brian Wandell (3M Award)

Improve Low-Rank Image Reconstruction with Nonlinear Kernels: Enhao Gong, Tao Zhang, Joseph Cheng, John Pauly

Deep convolutional neural network models of the retinal response to natural scenes: Lane McIntosh*, Niru Maheswaranathan*, Aran Nayebi, Surya Ganguli, Stephen Baccus (NVIDIA Award)

 

Depth, Scene Understanding and VR:

Depth Augmented Stereo Panoramas for Cinematic VR:  Jayant Thatte, Jean-Baptiste Boin, Haricharan Lakshman, Gordon Wetzstein, Bernd Girod (Apple Award)

Stereo Panorama Generation for Cinematic VR: Kushagr Gupta, Suleman Kazi (DVDO Award)

Content Adaptive Representations of Omnidirectional Videos for Cinematic Virtual Reality: Matthew Yu, Haricharan Lakshman, Bernd Girod  ( Intel Award, DVDO Award)

Render for Convolutional Neural Networks (CNN): Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views: Hao Su, Charles R. Qi, Yangyan Li, Leonidas J. Guibas (Huawei Award)

Real-time 3D Reconstruction with Global Pose Alignment: Angela Dai, Matthias Niessner, Michael Zollhoefer, Christian Theobalt, Pat Hanrahan (Intel  Award)

PIGraphs: Learning Interaction Snapshots from Observations: Manolis Savva, Angel X. Chang, Pat Hanrahan, Matthew Fisher, Matthias Niessner

Data-driven Structural Priors for Shape Completion: Minhyuk Sung, Vladimir G. Kim, Roland Angst, and Leonidas Guibas

Text to 3D Scene Generation: Angel X. Chang, Manolis Savva, Pat Hanrahan, Christopher D. Manning

 

Displays:

Tackling the Vergence-Accommodation conflict with Lightfields and Monovision: Robert Konrad, Vasanth Mohan, Fu-Chung Huang, Gordon Wetzstein  (Facebook Award)

Adaptive Color Display via Perceptually-driven Factored Spectral Projection: Isaac Kauvar, Samuel J Yang, Liang Shi, Ian McDowall, Gordon Wetzstein

Inexpensive LED Video Wall Project: Matt Lathrop & Stephen Hitchcock

 

3D Printing:

Computational Lithography for Single Exposure 2.5D Printing: Leandra Brickson, Gordon Wetzstein, Matthew O’Toole, Gordon Wetzstein

 

Applications:

Temporal Aggregation for Large-Scale Query-by-Image Video Retrieval: Andre Araujo, Jason Chaves, Roland Angst, Bernd Girod

Art++: Augmented reality in museums: Jean-Baptiste Boin, David Chen, Skanda Shridhar, Bernd Girod

Reality Informed VR: Matt Vitelli (Intel Award)

Applying a computer vision object tracking algorithm to detect musicians’ ancillary gestures: Madeline Huberth

Identifying Endangered Right Whales from Aerial Photographs: Catherine Mullings; Qingping He

Automating the Design of Game Visualizations: Abhijeet Mohapatra, Michael Genesereth


Abstracts


Title: Diffraction-aware Light Field Photography

Authors: Julie Chang, Xuemei Hu, Isaac Kauvar, Gordon Wetzstein

Abstract: Light fields have many applications in machine vision, consumer photography, robotics, and microscopy. However, the prevalent resolution limits of existing light field imaging systems hinder widespread adoption. In this paper, we analyze fundamental resolution limits of light field cameras in the diffraction limit. We propose a sequential, coded-aperture-style acquisition scheme that optimizes the resolution of a light field reconstructed from multiple photographs captured from different perspectives and f-number settings. We also show that the proposed acquisition scheme facilitates high dynamic range light field imaging and demonstrate a proof-of-concept prototype system.With this work, we hope to advance our understanding of the resolution limits of light field photography and develop practical computational imaging systems to overcome them.

Student Bios: 

Julie Chang is a Ph.D. student in Bioengineering.

Xuemei Hu is a visiting Ph.D. student from Tsinghua University in Automation.

Isaac Kauvar is a Ph.D. student in Electrical Engineering


Title: Color Architecture in Light Field Cameras

Authors: Trisha Lian and Brian Wandell

Abstract: We demonstrate the ability to simulate a full light field camera pipeline. We start from a virtual 3D scene, trace light through the appropriate lenses, image with a sensor, and finally process the image to be displayed. Through this simulation, we can vary parameters to prototype imaging systems and explore design trade-offs. For example, we use our simulation to experiment with different CFA (color filter array) designs in light field cameras, and compare and contrast their results.

Student Bios: Trisha Lian is a 2nd year PhD student in Electrical Engineering


Title: Doppler Time-of-Flight Imaging

Authors:  Shikhar Shrestha, Felix Heide, Andrew Thomas Naber

Abstract:  Over the last few years, depth cameras have become increasingly popular for a range of applications, including human-computer interaction and gaming, augmented reality, machine vision, and medical imaging. Many of the commercially-available devices use the lock-in time-of-flight principle, where active illumination is temporally coded and analyzed in the camera to estimate a per-pixel depth map of the scene. In this work, we propose a fundamentally new imaging modality for all time-of-flight (ToF) cameras: per-pixel radial velocity measurement. The proposed technique exploits the Doppler effect of objects in motion, which shifts the temporal illumination frequency before it reaches the camera. Using carefully coded illumination and modulation frequencies of the ToF camera, object velocities directly map to measured pixel intensities. By extending the system to an array of synchronized ToF cameras with custom external control hardware, we can estimate full 3D metric velocity o f moving objects in the scene, capture velocity and range images simultaneously, estimate depth at higher effective frame rates and capture ToF Light Fields which might open doors to entirely new applications. We will demonstrate a working Doppler ToF system alongside the poster during the event.

Student Bios: 

Shikhar Shrestha – MSEE ’16 (Software & Hardware Systems) Stanford; MSME ’15 (Mechatronics & Robotics) Stanford

Felix Heide – Visiting Student, Stanford; PhD Student in CS, UBC

Andrew Thomas Naber – PhD Student in EE, Stanford


Title: 3D bladder reconstruction from white light cystoscopy videos

Authors: Kristen L. Lurie, Dimitar V. Zlatev, Joseph C. Liao, Roland Angst, Audrey K. Bowden

Abstract: White light cystoscopy (WLC) is a ubiquitous imaging technique to visualize the interior of the bladder wall for applications such as for cancer surveillance, but the inability to correlate individual 2D images with 3D organ morphology limits its utility for quantitative or longitudinal studies of physiology, disease process, or cancer recurrence. As a result, most cystoscopy videos are used only for real-time guidance and are discarded after collection. To overcome this limitation, we developed a computational method to reconstruct and visualize a 3D model of the bladder from cystoscopy video that captures the shape and surface appearance of the bladder. We extended this method to also enable: (1) co-registration of two 3D models of the bladder to enable longitudinal tracking of the bladder appearance and (2) localization of microscopy data to the 3D model to generate comprehensive multimodal records. A key novelty of our strategy is the use of advanced computer vision techniques and unmodified, clinical-grade cystoscopy hardware with minor constraints on the video capture protocol, which presents a low barrier to practical clinical translation.

Student Bios: Kristen Lurie is a PhD student in a Electrical Engineering, and a member of the Stanford Biomedical Optics group led by Audrey K. (Ellerbee) Bowden. Her research interests lie at the intersection of biomedical optics and 3D computer vision. Her dissertation work is centered on multimodal endoscopy and complementary image reconstruction techniques for bladder cancer imaging.


Title:  Toward 3D Localization Microscopy With Convolutional Sparse Model

Authors: Hayato Ikoma, Gordon Wetzstein

Abstract: Localization-based super-resolution microscopy has advanced the resolution of optical microscopy up to tens of nanometers. The method sparsely excites fluorescence molecules, localizes the molecules at sub-pixel level, and accumulates the information to synthesize a single image achieving spatial resolution beyond the optical diffraction limit. Such localization has been extended to 3D with the use of an engineered point spread function encoding depth information. Although the localization algorithm of isolated molecules has matured both for 2D and 3D localization, the localization is still challenging for high-density images that have spatially-overlapping molecules. Due to this difficulty, a number of captures are still required to reconstruct a super-resolution image. Among the algorithms trying to overcome the limitation, sparsity promoting techniques have been shown to be effective for such high density images. In this project, we formulate the localization problem by identifying the image formation model with a convolutional sparse model. With the continuous basis pursuit method, the localization is performed in continuous domain

Student Bios: Hayato Ikoma is a first-year Master student of Electrical Engineering program working at Computational Imaging Group at Stanford University. He is currently interested in development of algorithms for optical imaging devices such as optical microscopes and satellite telescopes. He has been trained in multiple fields and received a B.E. in materials engineering from University of Tokyo (Japan), a M.S. in biophysics from Kyoto University (Japan), a M.S. in media arts and sciences from MIT (USA), and a M.S. in mathematics from École Normale Superieure de Cachan (France). His graduate study was supported by Iwadare Foundation, Funai Overseas Scholarship, and Fondation Mathématique Jacques Hadamard.


Title: Dielectric gradient metasurface optical elements

Authors: Dianmin Lin, Pengyu Fan, Erez Hasman, Mark L. Brongersma

Abstract: Gradient metasurfaces are 2-dimensional optical elements capable of manipulating light by imparting local, space-variant phase-changes on an incident electromagnetic wave. These surfaces have thus far been constructed from nanometallic optical antennas and high diffraction efficiencies have been limited to operation in reflection mode. We describe the experimental realization and operation of dielectric gradient metasurface optical elements capable of also achieving high diffraction efficiency in transmission mode in the visible. Ultrathin gratings, lenses, and axicons have been realized by patterning a 100-nm-thin Si layer into a dense arrangement of Si nanobeam-antennas. In addition to being ultrathin and compact, these multifunctional metasurfaces can provide entirely new functions that are very difficult or impossible to be reached by conventional optical components. The realization of metasurfaces open up a wide variety of applications, especially in the field of computational imaging and display.

Student Bios: Dianmin Lin is 5th year PhD student from EE Department at Stanford. Her research focuses on gradient metasurfaces, using nanostructures to make ultrathin planar optical elements, which was published on Science last year. She is now actively exploring the applications of metasurfaces in the field of computational imaging and display.


Title: Multisite two-photon three-dimensional random access in vivo calcium imaging

Authors: Samuel J. Yang, William E. Allen, Isaac V. Kauvar, Aaron Andalman, Noah Young, Christina K. Kim, Gordon Wetzstein, Karl Deisseroth

Abstract:  A major goal in neurophysiology is to record the activity of large ensembles of neurons simultaneously in awake, behaving animals. This problem is challenging for three-dimensional (3D) volumes of neurons in scattering tissue. Here we demonstrate a computational optics approach toward multisite random access three-dimensional in vivo recording of neural activity in scattering tissue volumes using multifocal spatial-light-modulator-based two-photon (2P) illumination and coded detection using an sCMOS camera. Richardson-Lucy deconvolution allows for the reconstruction of cellular activity traces from coded images. The technique operates without fast scanners and is relatively easily implemented. We are able to sample calcium signals at 10 Hz from 104 locations through a cranial window over S1 barrel cortex in head-fixed mice (N=2) virally transduced with GCaMP6m (AAVdj- Camk2a-GCaMP6m). This technique represents a promising approach to recording the activity of large neuronal populations in three dimensions in the cortex of awake, behaving animals; with further in vivo testing and optimization of single-cell spatial resolution, this approach may become a simpler alternative to fast single-point scanning approaches.

Student Bios:  Samuel Yang  is a final year Ph.D. candidate in Electrical Engineering. His research in the labs of Karl Deisseroth and Gordon Wetzstein focuses on computational microscopy and computational imaging and display. He focused on studying optics and engineering physics at Caltech and computer vision and machine learning while at Stanford, and has done research internships with Google Research and Google[x].


Title: A Bayesian Approach for Single Photon Pile-up Correction

Authors:  Rafael Setra, Felix Heide, Shikhar Shrestha, Mathew O’Toole, and Gordon Wetzstein

Abstract:  An existing issue in measuring temporal intensity profiles through time-correlated single-photon counting (TCSPC) is photon pile-up, a distortion in the resultant histogram which favors earlier arrivals. This distortion complicates several applications: fluorescence-lifetime imaging, positron emission tomography (PET), and transient imaging. Although correction algorithms exist, pile-up is commonly avoided through low-number photon emissions or time-gating, which becomes a time consuming process limited by input intensity. Furthermore, these algorithms do not account for timing jitter. We take a Bayesian approach and construct a correction method for pile-up in the presence of jitter. The effectiveness of the new model is presented on simulated data and compared with existing methods.

Student Bios: 

Rafael Setra is a first year Ph.D student in Electrical Engineering with a focus in signal processing. Previously, he attended the University of Maryland at College Park where he graduated with a double degree in Mathematics and Electrical Engineering. 

Felix Heide is a visiting student from UBM working on a PhD in CS. Shikhar is an EE Master’s student at Stanford.

Matthew O’Toole is a CS PhD student from the University of Toronto.


Title:  Contrast-Enhanced Optical Coherence Tomography With Picomolar Sensitivity for Functional in-vivo Imaging

Authors: Orly Liba, Elliott SoRelle, Adam de la Zerda

Abstract: Optical Coherence Tomography (OCT) uses low-coherence interferometry to provide micron resolution images of scattering from structures millimeters deep in tissue. Here we demonstrate the use of exogenous imaging agents with OCT to provide a promising platform for studying functional biology of tissues in vivo. In this study, we developed and applied highly-scattering large gold nanorods (LGNRs) and custom spectral detection algorithms for contrast-enhanced OCT. We used this approach for noninvasive 3D imaging of blood vessels deep in solid tumors in living mice. Additionally, we demonstrated multiplexed imaging of spectrally-distinct LGNRs that enabled observations of functional drainage in lymphatic networks. This method, which we call MOZART, provides a platform for molecular imaging and characterization of tissue noninvasively at cellular resolution.

Student Bios:  

Orly Liba is a PhD candidate in the Department of Electrical Engineering at Stanford. Before Stanford she worked as an algorithms engineer and team leader in image processing at two startups. Orly received an M.S. degree in Electrical Engineering from Tel-Aviv University and a BSc in Electrical Engineering and BA in Physics from the Technion.


Title: A CMOS Vision Sensor for Always-On, Energy Constrained Applications

Authors: Chris Young, Alex Omid-Zohoor, Boris Murmann

Abstract: Traditionally, there is a disconnect between hardware and algorithmic implementation in real-time, energy constrained object detection applications. Often, eight or more bit pixel images are passed through an image processing pipeline only to have their dimensionality reduced during a feature extraction step. We aim to demonstrate a CMOS imager in which features are generated at the column readout level using mixed-signal circuitry to reduce data throughput by an order-of-magnitude. We propose modified Histogram-of-Oriented-Gradients features developed for CMOS implementation and robustness to practical illumination conditions. We present simulation results showing our features achieving detection performance comparable with traditional methods despite being quantized to as few as two bits. We also show our proposed circuit architecture currently in development.

Student Bios:

Chris Young  is currently pursuing his PhD at Stanford in Electrical Engineering. His work centers around mixed-signal integrated circuits for feature-extraction in low power sensor interfaces with classifier backends. He has interned at Apple, Texas Instruments, and Bosch.

Alex Omid-Zohoor is a PhD student at Stanford, studying hardware implementations for computer vision pipelines. He also serves as the EE Undergraduate Advising TA, and has interned at Mindtribe Product Engineering.


Title:  Simulation of the effects of sensor and environmental parameters on image classification

Authors:  Garikoitz Lerma-Usabiaga, Brian Wandell

Abstract: The use of image sensors is expected to grow considerably as part of new autonomous vehicles, drones, robotics and the Internet of Things. In many of these applications, sensor data’s primary purpose will be to provide input for computations that classify and identify image content.  We describe a simulation approach for (a) optimizing sensor design for these applications, and (b) assessing system performance under different environmental conditions.  We combined the Image Systems Engineering Toolbox (ISET) with a convolutional neural network (CNN) implemented in TensorFlow™. The CNN classifies the raw sensor data, without applying conventional image-processing. We tested the system’s ability to classify written characters presented on an LED road sign over a range of distances and luminance levels, and we measured how classification performance varies with imaging  conditions and sensor parameters. These experiments illustrate how to use simulation tools to guide the design and assess the performance of imaging systems for these new applications.

Student BiosGarikoitz Lerma-Usabiaga is a visiting student at Stanford University. Electrical Engineer in origin, after some years in business now he is pursuing a PhD in Neuroscience at BCBL, San Sebastian (Spain). His interests are human vision and low level processes of reading, structural MRI methods and AI/Machine Learning.   


Title:  Frankencamera 4: A heterogenous platform for image processing

Authors: Steven Bell, Jing Pu, Mark Horowitz

Abstract: Computational photography is beginning to go mainstream, and augmented reality and other computer vision applications are starting to make their way onto cell phones.  However, two hurdles impede the adoption of these applications.  First, conventional CPUs and even GPUs do not have the power efficiency to run these applications on mobile devices for prolonged periods of time.  Second, most camera platforms are closed, and offer very little control of how the camera actually captures and processes its images. With the Frankencamera 4 research project, we are investigating hardware and software designs to overcome these barriers.  We compile image processing algorithms written in a high-level langague into FPGA or software designs, and link them together with an API that abstracts away the details of the underlying hardware.  To test our ideas, we’re building a physical camera with an FPGA at its core as a platform for computational photography research.

Student Bios:

Steven Bell is a PhD candidate in Electrical Engineering, with interests at the intersection of image processing, computational photography, embedded systems, and software system design.  He received an M.S. in EE from Stanford in 2013, and a B.S. in Computer Engineering from Oklahoma Christian University in 2011.

Jing Pu received a B.S. in microelectronics from Peking University and an M.S. in electrical engineering from Stanford University. He is currently pursuing a PhD at Stanford University. His research interests include VLSI design, computer architecture, graphics and imaging systems. He is currently working on building an energy efficient programmable architecture for computer vision applications.


Title: A novel image processing pipeline for mobile devices: Local, Linear and Learned (L3) pipeline

Authors: Qiyuan Tian, Haomiao Jiang, Steven Lansel, Joyce Farrell, Brian Wandell

Abstract: The high density of pixels in modern color sensors provides an opportunity for new color filter array (CFA) designs. By increasing the number and type of color filters, it is possible to improve sensor low-light sensitivity, dynamic range, color accuracy and additional multispectral, IR, polarization and light field information. However, novel CFAs require new image processing pipelines and the time and effort needed for the development. To address this issue, we developed a method (Local, Linear, Learned or L3) that automatically creates an image processing pipeline for any CFA. The L3 pipeline is a new imaging architecture based on look-up tables that store pre-computed linear transforms for different classes of sensor pixels. The optimal transforms are learned in a data-driven way from training data produced by camera simulation. The L3 pipeline applies pre-computed linear transforms that are associated with each class to translate the pixel values into the target output representation. This methodology integrates multiple processing steps into one transform, reduces runtime computation and allows for parallel computing acceleration. In this poster we describe the L3 algorithm in details and illustrate how we created a pipeline for CFAs containing clear/white pixels.

Student Bios: Qiyuan Tian is a Ph.D. candidate of the Department of Electrical Engineering. Qiyuan received B.Eng. (2011) in Communication Science and Engineering at Fudan University, China, and M.S. (2013) in Electrical Engineering at Stanford University. He studied as an undergraduate exchange student (2009) in the Department of Electronic and Computer Engineering at The Hong Kong University of Science and Techonology. His research interest is imaging systems.


Title:  Local Linear Approximation for Camera Image Processing Pipelines

Authors: Haomiao Jiang, Joyce Farrell, Brian Wandell

Abstract:  We propose a method to approximate the existing multi-step camera image  processing pipeline in one step with local linear filters. The filters are learned from pairs of camera raw and RGB images. The experimental results demonstrate that the complicated existing pipeline in digital cameras can be well approximated by the proposed method and shows the potential of optimizing them under the proposed schema for efficiency and low power.

Student Bios: Haomiao Jiang is a PhD Candidate in Electrical Engineering at Stanford University. He is supervised by Prof Brian Wandell and working on automated camera image processing pipelines, display modeling, color vision and human vision front-end simulation.


Title: Simultaneous Estimation of Surface Reflectance and Fluorescence

Authors: Henryk Blasinski, Joyce Farrell, Brian Wandell

Abstract: Light can interact with objects in two different ways. The first type of interaction occurs when photons are passively reflected by a surface. In this case incident and reflected photon wavelengths do not change, but some are absorbed by the object reducing the reflected light intensity. The second type of interaction, called fluorescence, is an active process. Photons at one wavelength are absorbed and excite photons at a different wavelength. The two types of interactions provide complementary information about material properties and are often exploited in biology and medicine. Conventional imaging systems, however, cannot discriminate between the reflected and fluoresced photons and thus fail to fully characterize different surfaces. I will present a simple and cheap computational imaging system capable of performing the disambiguation between reflected and fluoresced components. The system is composed of a multi-band camera and a small number of narrowband LED lights. Surface reflectance and fluorescence properties are estimated from a sequence of images captured under narrowband lights, by finding a solution to an inverse estimation problem. This estimation problem uses an image formation model expressing captured pixel intensities in terms of system, surface reflectance and fluorescence properties. I will also describe how we plan to use our system in large in-situ fluorescence imaging studies to diagnose and monitor the health of coral reefs.

Student Bios: Henryk Blasinski received the M.S. degree (Hons.) in telecommunications and computer science from the Lodz University of Technology, Lodz, Poland, and the Diplome d’Ingeneiur degree from the Institut Superieur d’Electronique de Paris, France, in 2008 and 2009, respectively. He was a Fulbright Scholar with the Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, from 2010 to 2011. At present he is pursuing a Ph.D. degree at the Department of Electrical Engineering, Stanford University, CA. Henryk’s research interests include image processing, human and computer vision and machine learning. Henryk is a recipient of several awards, including the Fulbright Fellowship, the Fellowship from the Minister of Higher Education of the Republic of Poland, the Polish Talents Award, the DP Systems Award, the Fellowship of the Lodz Region Marshall, the Crawford Prize for the best M. Sc. project and the 2014 SPIE Digital Photography X Best Paper Award .


Title:  Improve Low-Rank Image Reconstruction with Nonlinear Kernels

Authors: Enhao Gong, Tao Zhang, Joseph Cheng, John Pauly

Abstract: Low-Rank (LR)/Partially-Separable models are widely applied in Sparse Signal Reconstruction applications such as Video Processing, Image Denoising and Dynamic MRI Reconstruction. LR enables accelerated acquisition and improves sparse-reconstruction by exploiting spatial-temporal correlation. Existing LR methods model the spatial-temporal signal as a sparse and linear combination of basis-signals. However, these linear models cannot fully capture the complex signal changes and may lead to inaccurate contrast dynamics in reconstruction. Dictionary Learning based methods tackles this issue using over-complete dictionaries, which needs expensive computation and training data. Here we propose a generalized LR model with Adaptive Nonlinear Kernels. This Kernelized-Low-Rank (KLR) model assumes the LR property in a nonlinear-transform domain instead of the original spatial-temporal domain. The proposed method captures the spatial-temporal dynamics with sparser nonlinear representations, and achieves more accurate reconstruction results.

Student Bios:  Enhao Gong is a Ph.D. student in the Department of Electrical Engineering at Stanford. His research focuses on Compressed Sensing, Machine Learning and Computer Vision algorithm and its application in Image Processing and Magnetic Resonance Imaging (MRI) reconstruction.


Title: Deep convolutional neural network models of the retinal response to natural scenes

Authors: Lane McIntosh*, Niru Maheswaranathan*, Aran Nayebi, Surya Ganguli, Stephen Baccus

Abstract: In order to understand how and why biological vision pathways perform particular computations, we must first know what they do. In this work we demonstrate that convolutional neural networks (CNNs) are considerably more accurate at capturing retinal responses to held-out natural scenes stimuli than linear-nonlinear (LN) models and related models, such as generalized linear models (GLMs). Furthermore, we find CNNs generalize significantly better across classes of stimuli (white noise vs. natural scenes) they were not trained on. Remarkably, analysis of these CNNs reveals internal units selective for visual features on the same small spatial scale as the main excitatory interneurons of the retina, bipolar cells. Moreover, probing the model with reversing gratings, paired flashes, and contrast steps reveals that the CNN learns nonlinear retinal response properties such as frequency doubling and adaptation, even though the CNNs were not trained on such stimuli. Overall, this work demonstrates the power of CNNs to not only accurately capture sensory circuit responses to natural scenes, but also uncover the circuit’s internal structure and function.

Student Bios:

Lane McIntosh is a Neurosciences PhD candidate at Stanford University. His research utilizes tools from machine learning, information theory, and high-dimensional data analysis to uncover general principles of information transmission in the early stages of vision. Lane completed an M.A. in Mathematics at the University of Hawaii, where he was an NSF fellow, and a B.A. in Computational Neuroscience from University of Chicago.

Niru Maheswaranathan is a PhD candidate in the Neurosciences program at Stanford University. He is interested in neural coding and sensory neuroscience. Currently, Niru is working on understanding how the retina, the first stage of visual processing, encodes natural images. Niru completed his B.S.E. in Biomedical and Electrical & Computer Engineering at Duke University.

Aran Nayebi is a Master’s student in Computer Science at Stanford University. Aran is interested in information processing in neural systems with a particular focus on sensory systems such as the retina. Aran applies tools from machine learning to bear on systems neuroscience problems. Aran completed his B.S. in Mathematics and Symbolic Systems at Stanford University.


Title: Depth Augmented Stereo Panoramas for Cinematic VR

Authors: Jayant Thatte, Jean-Baptiste Boin, Haricharan Lakshman, Gordon Wetzstein, Bernd Girod

Abstract: Cinematic virtual reality (VR) aims to provide immersive visual experiences of real-world scenes on head-mounted displays. Current cinematic VR systems employ omnidirectional stereo videos from a fixed head position and hence do not address head motion parallax, which is an important cue for depth perception. Similarly, they do not address focus cues, which are important for depth perception as well as for comfortable viewing. We propose a new stereo content representation, referred to as depth augmented stereo panorama (DASP), to address both these issues. DASP is developed considering data capture, postproduction, streaming, and rendering stages of the cinematic VR pipeline. The capabilities of this representation are evaluated by comparing the generated viewports with that of known 3D models. Results indicate that DASP can successfully create stereo, induce accurate head motion parallax, as well as support focus cues, in a predefined operating range.

Student Bios: 

Jayant Thatte is a 2nd year Ph.D. candidate in Electrical Engineering Department at Stanford University

Jean-Baptiste Boin is a 4th year Ph.D. candidate in Electrical Engineering Department at Stanford University

Hari Lakshman is a visiting assistant professor in Electrical Engineering Department at Stanford University


Title: Stereo Panorama Generation for Cinematic VR

Authors: Kushagr Gupta, Suleman Kazi

Abstract: This project describes generation of stereo panorama from images acquired by the rotation of a single camera. The camera is mounted on a robotic arm which ensures stability and a steady rotational velocity. Strips are taken from the right and left side of the collected images. These strips are then mosaic-ed together to form two images. The strips from the right side form the image for the left eye and vice versa. An automatic disparity control algorithm is then applied to the obtained stereo pair to control disparity. The stereo panorama obtained can be viewed on virtual reality displays or using other methods like anaglyphs with the primary application being Cinematic Virtual Reality where depth perception will give user an immersive experience.

Student Bios:

Kushagr Gupta is presently pursuing his Masters in Electrical Engineering from Stanford University. Broadly his areas of interest are signal processing, robotics and vision. His current work and interest is in integrating machine learning and artificial intelligence techniques with image processing for applications ranging from segmentation and classification to visual recognition.

Suleman Kazi is from Pakistan, and presently pursuing his Masters in Electrical Engineering from Stanford University. His area of interests are in robotics, signal processing and optimization.


Title: Content Adaptive Representations of Omnidirectional Videos for Cinematic Virtual Reality

Authors: Matthew Yu, Haricharan Lakshman, Bernd Girod

Abstract: Cinematic virtual reality provides an immersive visual experience by presenting omnidirectional videos of real-world scenes. A key challenge is to develop efficient representations of omnidirectional videos in order to maximize coding efficiency under resource constraints, specifically, number of samples and bitrate. We formulate the choice of representation as a multi-dimensional, multiple-choice knapsack problem and show that the resulting representations adapt well to varying content. We also show that separation of the sampling and bit allocation constraints leads to a computationally efficient solution using Lagrangian optimization with only minor performance loss. Results across images and videos show significant coding gains over standard representations.

Student Bios: Matt Yu is a PhD candidate in the Electrical Engineering Department at Stanford University. He received a M.S. degree from Stanford and a B.S. degree from California Institute of Technology, both in electrical engineering. His primary research interests are computational photography, computer vision, and video streaming although he has also worked on personalization systems and illumination-invariant image matching. Matt was a recipient of a 2014-15 Brown Institute for Media Innovation Magic Grant where he worked on the virtual reality documentary, Reframe Iran.


Title: Temporal Aggregation for Large-Scale Query-by-Image Video Retrieval

Authors: Andre Araujo, Jason Chaves, Roland Angst, Bernd Girod

Abstract: We address the challenge of using image queries to retrieve video clips from a large database. Using binarized Fisher Vectors as global signatures, we present three novel contributions. First, an asymmetric comparison scheme for binarized Fisher Vectors is shown to boost retrieval performance by 0.27 mean Average Precision, exploiting the fact that query images contain much less clutter than database videos. Second, aggregation of frame-based local features over shots is shown to achieve retrieval performance comparable to aggregation of those local features over single frames, while reducing retrieval latency and memory requirements by more than 3X. Several shot aggregation strategies are compared and results indicate that most perform equally well. Third, aggregation over scenes, in combination with shot signatures, is shown to achieve one order of magnitude faster retrieval at comparable performance. Scene aggregation also outperforms the recently proposed aggregation in random groups

Student Bios: André Araujo is a PhD candidate in electrical engineering at Stanford University, advised by Prof. Bernd Girod. André’s PhD thesis focuses on image and video retrieval from large databases, although he has also worked on image classification and video compression during his PhD. André has worked on research and engineering projects with several companies, such as Google, Technicolor and Arcelor-Mittal. He is a Fulbright Science & Technology scholar and an Accel Innovation Scholar. He holds an MSEE from the University of Campinas, Brazil (focusing on video compression), and a BSEE from a double-degree program between the University of Campinas, Brazil and the Institut National des Sciences Apliquees de Lyon, France, with honors.


Title:  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views

Authors: Hao Su, Charles R. Qi, Yangyan Li, Leonidas J. Guibas

Abstract: Object viewpoint estimation from 2D images is an essential task in computer vision. However, two issues hinder its progress: scarcity of training data with viewpoint annotations, and a lack of powerful features. Inspired by the growing availability of 3D models, we propose a framework to address both issues by combining render-based image synthesis and CNNs (Convolutional Neural Networks). We believe that 3D models have the potential in generating a large number of images of high variation, which can be well exploited by deep CNN with a high learning capacity. Towards this goal, we propose a scalable and overfit- resistant image synthesis pipeline, together with a novel CNN specifically tailored for the viewpoint estimation task. Experimentally, we show that the viewpoint estimation from our pipeline can significantly outperform state-of-the-art methods on PASCAL 3D+ benchmark. 

Student Bios: Charles Ruizhongtai Qi is now a PhD candidate in Stanford EE. His research focuses on machine learning, computer vision and graphics, especially on using deep learning to understand the 3D world for future robotics and augmented reality applications. Charles obtained his bachelor degree from Tsinghua University and have had exchange experiences in Aalto University (Finland), Carnegie Mellon University and University of Notre Dame.


Title: Real-time 3D Reconstruction with Global Pose Alignment

Authors: Angela Dai, Matthias Niessner, Michael Zollhoefer, Christian Theobalt, Pat Hanrahan

Abstract: We present a novel real-time 3D reconstruction approach that jointly solves for global pose alignment while generating and visualizing an accurate 3D reconstruction during the scanning process. In particular, we tackle the problem of geometric drift and tracking failure which is inherent to the model-to-frame tracking used by state-of-the-art real-time reconstruction methods. Our approach provides robust global tracking and implicitly solves the loop closure problem by globally optimizing the pose trajectory for every captured frame. Hence, we can easily recover from dead reckoning in feature-less regions and revisit previously scanned areas in order to obtain complete and accurate reconstructions. As we generate a 3D reconstruction on-the-fly, our proposed framework facilitates easy and robust large-scale digitization with just a single hand-held, commodity RGB-D sensor. Our approach out-performs current state-of-the-art online reconstruction systems not only in robustness and completeness, but also in terms of globally obtained accuracy. Overall, we obtain globally-aligned 3D reconstructions in a real-time setup at a reconstruction quality that was previously only attainable with offline methods.

Student Bios:  Angela Dai is a third year PhD student advised by Pat Hanrahan. Her research focuses on 3D scanning and reconstruction.


Title: PIGraphs: Learning Interaction Snapshots from Observations

Authors: Manolis Savva, Angel X. Chang, Pat Hanrahan, Matthew Fisher, Matthias Niessner

Abstract: We present a method to generate Interaction Snapshots: static depictions of human poses and relevant objects during human-object interactions.  Using commodity depth sensors, we observe people acting within real-world environments. We demonstrate that augmenting geometry with a probabilistic model of human interactions can enable a novel human-centric understanding of 3D content

Student Bios: Manolis Savva is a PhD student in the computer graphics lab.  His research focuses on 3D scene understanding, connecting 3D content with common sense knowledge, and modeling of human interactions with 3D environments.


Title:  Data-driven Structural Priors for Shape Completion

Authors: Minhyuk Sung, Vladimir G. Kim, Roland Angst, and Leonidas Guibas

Abstract: We propose a novel data-driven shape completion algorithm that leverages both a database of exemplar segmented shapes and symmetry priors to estimate the geometry of the occluded regions. Our main contribution is a method that predicts a shape part and symmetry structures from a partial scan.

Student Bios: Minhyuk Sung is a third-year Ph.D. student in CS department advised by Leonidas Guibas.


Title: Text to 3D Scene Generation

Authors: Angel X. Chang, Manolis Savva, Pat Hanrahan, Christopher D. Manning

Abstract: Designing 3D scenes is currently a creative task that requires significant expertise and effort in using complex 3D design interfaces.  This design process starts in contrast to the easiness with which people can use language to describe real and imaginary environments.  We present an interactive text to 3D scene generation system that allows a user to design 3D scenes using natural language.  A user provides input text from which we extract explicit constraints on the objects that should appear in the scene.  Given these explicit constraints, the system then uses a spatial knowledge base learned from an existing database of 3D scenes and 3D object models to infer an arrangement of the objects forming a natural scene matching the input description.  Using textual commands the user can then iteratively refine the created scene by adding, removing, replacing, and manipulating objects.

Student Bios: Angel Chang is a graduating PhD student advised by Chris Manning in the NLP group.  Her research focuses on the intersection of natural language understanding, computer graphics, and AI.


Title: Tackling the Vergence-Accommodation conflict with Lightfields and Monovision

Authors: Robert Konrad, Vasanth Mohan, Fu-Chung Huang

Abstract: Emerging virtual reality (VR) displays must overcome the prevalent issue of visual discomfort to provide high-quality and comfortable user experiences. In particular, the mis-match between vergence and accommodation cues inherent to most stereoscopic displays has been a long standing challenge. We evaluate several display modes that have the promise to mitigate visual discomfort caused by the vergence-accommodation conflict, and improve comfort as well as performance in VR environments. In particular, we explore monovision as an unconventional mode that accommodates each eye of the observer at different depths. While this technique is common practice in ophthalmology, we are the first to test its effectiveness for VR applications with a custom built focus-tunable display. We will also present the light field stereoscope first showed at SIGGRAPH 2015, which provides accurate focus cues for the user hence reducing the vergence-accommodation conflict.

Student Bios: 

Robert  Konrad is a PhD student in the Stanford Computational Imaging group led by Professor Gordon Wetzstein. He is investigating using focus-tunable optics for the purpose of alleviating the vergence-accommodation conflict in near-eye displays.

Vasanth Mohan is currently a masters student in Electrical Engineering and finished his undergraduate degree in Computer Science in 2015 from Stanford.  He is very passionate about virtual reality technologies and is excited about all of the challenges this new medium faces.


Title:  Adaptive Color Display via Perceptually-driven Factored Spectral Projection

Authors: Isaac Kauvar, Samuel J Yang, Liang Shi, Ian McDowall, Gordon Wetzstein

Abstract: Fundamental display characteristics are constantly being improved, especially resolution, dynamic range, and color reproduction. However, whereas high resolution and high-dynamic range displays have matured as a technology, it remains largely unclear how to extend the color gamut of a display without either sacrificing light throughput or making other tradeoffs. Here, we advocate for adaptive color display; with hardware implementations that allow for color primaries to be dynamically chosen, an optimal gamut and corresponding pixel states can be computed in a content-adaptive and user-centric manner. We build a flexible gamut projector and develop a perceptually-driven optimization framework that robustly factors a wide color gamut target image into a set of time-multiplexed primaries and corresponding pixel values. We demonstrate that adaptive primary selection has many benefits over fixed gamut selection and show that our algorithm for joint primary selection and gamut map ping per forms better than existing methods. Finally, we evaluate the proposed computational display system extensively in simulation and, via photographs and user experiments, with a prototype adaptive color projector.

Student Bios: Isaac Kauvar and Samuel  Yang are PhD candidates, Liang Shi is an MS student in Electrical Engineering, and Ian McDowall works at Intuitive Surgical and Fakespace Labs.


Title: Inexpensive LED Video Wall Project

Authors: Matt Lathrop & Stephen Hitchcock

Abstract: We created a large, modular, LED video display to be used in a variety of activities from concerts, to theatrical productions, to art installations. The wall is made up of 50 4’ x 4’ panels for a total size of 20’ x 40’. LED technology has always been expensive, primarily due to the high costs associated with producing batches of quality LEDs to create a uniform image. This video wall was made for roughly 1/10th the cost of a similar pixel density professional product. The reduction in cost was enabled by using inexpensive LEDs and then imaging our panels with a dSLR to measure the luminance of each LED. This resulted in all the LEDs producing the same colors. Furthermore, we used a color spectrometer to record the gamut, white point, and gamma of the LEDs. With this data we mapped the sRGB color space into the color space of the LED wall allowing us to produce content then display it on the wall while preserving the colors in the final image. These techniques, combined with the hardware and software design, produced a professional looking video wall for a fraction of the cost of alternatives.

Student Bios: 

Matt Lathrop is a senior at Stanford University studying Computer Science (graphics) and Theatre (lighting design).  Matt has conducted two major research projects bringing together computer science and lighting design while at Stanford. The first being the Remote Controlled Follow Spots project published in USITT’s TD&T Magazine, and the second being the Low-Cost LED Video Wall project. Matt has presented at conferences including BIB ISDSWE 2014 and the International Lighting Symposium Hong Kong 2015.

Stephen Hitchcock is a sophomore at Stanford University studying Computer Science (graphics) and Theatre (lighting design). Stephen participated in a research project last year aimed at developing a low cost LED display and presented his findings at the International Lighting Symposium Hong Kong 2015. He is currently developing a new protocol for theatrical systems integration.


Title: Computational Lithography for Single Exposure 2.5D Printing

Authors: Leandra Brickson, Gordon Wetzstein, Matthew O’Toole

Abstract: Excitement in 3D printing as a rapid prototyping and low-scale production method has been steadily growing throughout the last decade. Among 3D printing techniques, Stereolithography (SLA) has shown to be one of the highest resolution techniques, leading to smooth, high-detail structures. However, the SLA process is very slow due to the layer-by-layer nature of the print. The printing speed of SLA can be dramatically increased if the resin curing process is more precisely controlled to allow a thicker layer, or single-layer exposure. This poster looks at creating a resin-curing model based off Dill absorption parameters to predict resin growth for single exposure height profile(2.5D) printing using SLA methods. This model is compared to experimental data, and basic curing predictions are explored. Once refined, this model can be expanded to 3D applications to allow single exposure 3D printing.

Student Bios: Leandra Brickson is a 1st year Electrical Engineering phd student with previous experience in nano-electronic and photonic device fabrication. Coming from NC State University with a masters in Electrical Engineering, previous work includes work in interference-lithography-based nanowire gas sensors and Multi-layered liquid crystal structures for arbitrary phase patterned elements.


Title: Art++: Augmented reality in museums

Authors: Jean-Baptiste Boin, David Chen, Skanda Shridhar, Bernd Girod

Abstract: Augmented reality (AR) is a very fast growing field which enables brand new types of interaction with the world. Unlike virtual reality, which takes the user in a completely different world, AR is deeply rooted in the user’s real environment. The Art++ project aims at building an AR based art museum guide for mobile devices. If many of the algorithms used to enable AR already exist, some considerable amount of work was required to make them viable in real-time for mobile devices with relatively limited memory and computational resources. The system we present uses a highly memory efficient image retrieval pipeline to identify and locate a painting. This is combined with a robust fast tracking algorithm that allows for a seamless augmented reality experience. Art++ was started in September 2014 and is funded by a Magic Grant from the Brown Institute for Media Innovation. It is a collaboration with the Cantor Arts Center.

Student Bios: Jean-Baptiste Boin is a PhD candidate in the department of Electrical Engineering at Stanford, advised by Prof. Bernd Girod. His research interests lie in image retrieval and computer vision, in particular augmented and virtual reality. He is a founding member and the main developer of the Art++ project, a mobile augmented reality guide for museums. This project, started in September 2014, is funded by a Magic Grant from the Brown Institute for Media Innovation, and is a collaboration with the Cantor Arts Center.


Title: Reality Informed VR

Authors: Matt Vitelli

Abstract:  This project utilizes RGB-D sensors as a means of synthesizing a live 3D mesh of the real world that can be displayed on consumer HMDs. The mesh can be used in conjunction with existing VR environments to create mixed reality experiences and apply augmentations on top of real world scenery. The project makes use of a variety of computer vision and computer graphics techniques to create a compelling user experience.

Student Bios:  Matt Vitelli is a Masters student in Computer Science. He has worked on a variety of projects related to robotics, computer vision, computer graphics, and deep learning. His interests lie primarily at the intersection of art and technology.


Title: Applying a computer vision object tracking algorithm to detect musicians’ ancillary gestures

Authors:  Madeline Huberth

Abstract: When playing music, musicians’ body movements relate not only to playing their instrument directly, but to their expressive intentions and conceptualization of musical structure. The latter type of motions are termed ancillary gestures, which have been shown to correspond with the level of the phrase, especially in performers’ head and torso. In this project, I apply and evaluate an object tracking algorithm (Deshmukh & Gholap, 2012) on prerecorded videos for the purpose of tracking colored markers (1-in squares of retroreflective tape) on a musicians’ body. Testing the algorithm on front- and side-view recordings of 13 cellists playing the same music, I observe when the algorithm fails to accurately track the markers, and describe subsequent color manipulations applied to the videos to improve algorithm performance. Further observations, such as algorithm speed across different machines, and effects of camera frame rate on accuracy, will be discussed. I will also, using xy-pixel data of the tracked markers, describe preliminary findings about the intended research goal of the application, which was to characterize musicians’ movements when they conceptualize short melodic groupings rather than long phrases.

Student Bios: Madeline Huberth is the Geballe Graduate Fellow (SIGF) and a PhD Candidate at Stanford University’s Center for Computer Research in Music and Acoustics. Her research intersects music psychology and the study of gesture in performance, exploring production and perception of polyphony using EEG, motion capture, and behavioral studies. She also performs and composes for the Stanford Laptop Orchestra, with a focus on gesture in new computer-mediated instruments. Prior to coming to Stanford, she received a B.M. in Cello Performance and a B.S. in Interdisciplinary Physics from the University of Michigan, and her masters from the University of Cambridge as a Gates Cambridge Scholar.


Title:  Identifying Endangered Right Whales from Aerial Photographs

Authors: Catherine Mullings; Qingping He

Abstract: North Atlantic right whales are an endangered species; less than 500 exist today. To protect the species from extinction, specialized researchers need to monitor each individual right whale. To monitor the whales, aerial surveys to photograph whales are taken and then researchers manually label the whales from the photographs. With only a few specialized researchers and limited budget, this identification process is time consuming and detracts time from conservation efforts. Participating in this Kaggle competition, we aim to automate the right whale recognition process, that is, given an aerial photograph of a whale, being able to identify the unique whale of the ~500 right whales that exist today.

Student Bios: Catherine Mullings is a coterminal student in computer science with interests in computer graphics, computer vision, and image processing.


Title:  Automating the Design of Game Visualizations

Authors: Abhijeet Mohapatra, Michael Genesereth

Abstract: We present a novel system called Merlin that automates the design of an arbitrary game’s visualization from its description. In our work, we focus on games that are described using the formal language, called Game Description Language (GDL). Merlin discovers underlying game concepts such as boards and pieces by computing “invariant projections” of game states, and automatically generates visualizations for these concepts. Merlin also allows different visualizations for the game concepts to be composed, thereby generating different, potentially new visualizations for games. This composition can be leveraged by game artists to incrementally improve their existing visualizations.

Student Bios: Abhijeet Mohapatra is a Ph.D. candidate in Computer Science Department at Stanford University. He received his B.Tech in Computer Science from I.I.T Kharagpur in India. His research focuses on supporting aggregates in logic programs. He is also very interested in developing effective and efficient interfaces to visualize databases and workflows, and has developed different tools in these domains, including (a) Dexter, a browser based tool that allows users to pose ad hoc queries on web-accessible structured data, (b) tPrime, a browser based tool that allows users to aggregate and visualize structured data through an intuitive graphical interface in a manner similar to zooming in and out of maps. His interest in workflow visualization was triggered by his work on automatically generating visualizations for games in general game playing.