SCIEN Affiliates Meeting 2016 Poster Presentations
Interactive Dynamic Video: Abe Davis, Doug James
A Wide-Field-of-View Monocentric Light Field Camera: Donald G. Dansereau, Glenn Schuster, Joseph Ford, Gordon Wetzstein
Wearable Skin Deformation as Force Feedback in Virtual Reality: Samuel B. Schorr and Allison M. Okamura
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes: Angela Dai, Angel Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Niessner
Making Virtual Reality Better than Reality: Nitish Padmanaban, Robert Konrad, Tal Stramer, Emily Cooper, Gordon Wetzstein
Realistic Camera Simulation for Machine Learning Applications: Henryk Blasinski, Trisha Lian, Joyce Farrell and Brian Wandell
3D Semantic Parsing of Large-Scale Indoor Spaces: Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, Silvio Savarese
A Simulation Toolbox for Prototyping Imaging Systems: Trisha Lian, Henryk Blasinski and Brian Wandell
Aperture interference light field microscopy for fluorescence volumetric imaging: Isaac Kauvar, Julie Chang, Gordon Wetzstein
Speckle-Free Coherence Tomography of Turbid Media: Orly Liba, Matthew D. Lew, Elliott D. SoRelle, Rebecca Dutta, Derek Yecies, Debasish Sen, Darius M. Moshfeghi, Steven Chu, Adam de la Zerda
Capsule Ultrasound Device: Spyridon Baltsavias, Farah Memon, Junyi Wang, Gerard Touma, Morten Rasmussen, Chienliu Chang, Eric Olcott, R. Brooke Jeffrey, Butrus T. Khuri-Yakub, and Amin Arbabian
Saliency in VR: How do people explore virtual reality scenes?: Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Gordon Wetzstein
The Field of View Available to the Cortical Reading Circuitry: Rosemary Le, Nathan Witthoft, Michal Ben-Shachar and Brian Wandell
Automated methods for Identification and avoidance of axon bundle activation for Epi-retinal prosthesis: Nandita Bhaskhar, Karthik Ganesan, Lauren Grosberg, E.J.Chichilnisky, Subhasish Mitra
Cortical areas encoding visual segmentation cues from relative motion and relative disparity: Peter J. Kohler, Benoit Cottereau & Anthony M. Norcia
3D-R2N2: recurrent reconstruction neural network: Christopher B. Choy, Danfei Xu, JunYoung Gwak, Silvio Savarese
Microanatomical differences predict functional differences in human ventral visual cortex: Rosenke, M., Weiner, K.S., Barnett, M., Zilles, K., Amunts, K., Goebel, R., Grill-Spector, K.
ISETBio: A Computational Engine for Modeling the Early Visual System: James Golden, David Brainard, E.J. Chichilnisky, Fred Rieke, Joyce Farrell, Nicolas Cottaris, Haomiao Jiang, Xiaomao Ding, Ben Heasley, Jonathan Winawer, Brian Wandell
ProxImaL: Efficient Image Optimization using Proximal Algorithms: Felix Heide, Steven Diamond, Matthias Niessner, Jonathan Ragan-Kelley, Wolfgang Heidrich, Gordon Wetzstein
Title: Interactive Dynamic Video
Authors: Abe Davis, Doug James
Abstract: One of the most important ways that we experience our environment is by manipulating it: we push, pull, poke, and prod to test hypotheses about our surroundings. By observing how objects respond to forces that we control, we learn about their dynamics. Unfortunately, regular video does not afford this type of manipulation – it limits us to observing what was recorded. In this work we present algorithms for turning regular video of vibrating objects into dynamic videos that users can interact with. By analyzing subtle vibrations in video, we cab extract plausible, image-space simulations of physical objects. We show how these simulations can be used for interaction, as well as for low cost special effects and structural analysis.
Bio: Abe Davis recently earned his PhD from MIT (fall 2016), where he was advised by Fredo Durand, before moving to Stanford to do a postdoc with Prof Doug James. Abe’s PhD dissertation, “Visual Vibration Analysis,” which won the MIT Sprowls award for best doctoral thesis in computer science, focused on analyzing subtle vibrations in video to enable a variety of applications ranging from recovering sound from silent video (the visual microphone), to estimating material properties, structural analysis, and even simulating objects in video. In 2015, Business Insider listed Abe as one “The 8 most innovative scientists in tech and engineering,” an in 2016 Abe was included in Forbe’s annual list of “30 under 30”.
Title: Speckle-Free Coherence Tomography of Turbid Media
Authors: Orly Liba, Matthew D. Lew, Elliott D. SoRelle, Rebecca Dutta, Derek Yecies, Debasish Sen, Darius M. Moshfeghi, Steven Chu, Adam de la Zerda
Abstract: Optical coherence tomography (OCT) is a powerful biomedical imaging technology that relies on the coherent detection of backscattered light to image tissue morphology in vivo. As a consequence, OCT is susceptible to coherent noise (speckle noise), which imposes significant limitations on its diagnostic capabilities. Here we show a method based purely on light manipulation that is able to entirely remove the speckle noise originating from turbid samples without any compromise in resolution. We refer to this method as Speckle-Free OCT (SFOCT). Using SFOCT, we succeeded in revealing small structures that are otherwise hidden by speckle noise when using conventional OCT, including the inner stromal structure of a live mouse cornea, the fine structures inside the mouse pinna, sweat ducts, Meissners corpuscle in the human fingertip skin and white matter fascicles in the brain of a live mouse. SFOCT has the potential to markedly increase OCTs diagnostic capabilities of various human diseases by revealing minute features that correlate with early pathology.
Bio: Orly Liba is a 4th year PhD candidate in the Department of Electrical Engineering and a researcher in the de la Zerda lab at Stanford University. Orly received a B.Sc. in Electrical Engineering and B.A. in Physics from the Technion, Israel and M.Sc. in Electrical Engineering from Tel-Aviv University, Israel. She is currently interested in developing computational and optical tools for medical imaging with optical coherence tomography (OCT) and is also interested in computational imaging and photography.
Title: Wearable Skin Deformation as Force Feedback in Virtual Reality
Authors: Samuel B. Schorr and Allison M. Okamura
Abstract: One of the main barriers to immersivity during object manipulation in virtual reality is the lack of realistic haptic feedback. Our goal is to convey compelling interactions with virtual objects, such as grasping, squeezing, pressing, lifting, and stroking, without requiring a bulky, world-grounded kinesthetic feedback device (traditional haptics) or the use of predetermined passive objects (haptic retargeting). To achieve this, we use a pair of finger-mounted haptic feedback devices that deform the skin on the fingertips to convey cutaneous force information from object manipulation. We show that users can perceive differences in virtual object weight and that they apply increasing grasp forces when lifting virtual objects as rendered mass is increased. Moreover, we show how naive users perceive changes of a virtual object’s physical properties when we use skin deformation to render objects with varying mass, friction, and stiffness. These studies demonstrate that wearable skin deformation devices can provide a compelling, large-workspace haptic experience appropriate for virtual reality scenarios involving object manipulation.
Bio: Samuel B. Schorr received the B.S. in Mechanical Engineering from the University of Texas, Austin, in 2011, and the M.S. in Mechanical Engineering from Stanford University, Stanford, CA, in 2013. He is currently pursuing a doctoral degree in the department of mechanical engineering, Stanford University, Stanford, CA. His research interests include haptics, virtual reality, teleoperation, medical robotics, and novel use of sensory substitution methods. (His coauthor, Allison M. Okamura, is a Professor of Mechanical Engineering at Stanford University)
Title: The Field of View Available to the Cortical Reading Circuitry
Authors: Rosemary Le, Nathan Witthoft, Michal Ben-Shachar and Brian Wandell
Abstract: Skilled reading requires rapidly recognizing letters and word forms; people learn this skill best for words presented in the central visual field. Measurements over the last decade have shown that when children learn to read, responses within ventral occipito-temporal cortex (VOT) become increasingly selective to word forms. We call these regions the VOT reading circuitry (VOTRC). The portion of the visual field that evokes a response in the VOTRC is called the field of view (FOV). We measured the FOV of the VOTRC and found that it is a small subset of the entire field of view available to the human visual system. For the typical subject, the FOV of the VOTRC in each hemisphere is contralaterally and foveally biased. The FOV of the left VOTRC extends ~9° into the right visual field and ~4° into the left visual field along the horizontal meridian. The FOV of the right VOTRC is roughly mirror symmetric to that of the left VOTRC. The size and shape of the FOV covers the region of the visual field that contains relevant information for reading English. It may be that the size and shape of the FOV, which varies between subjects, will prove useful in predicting behavioral aspects of reading.
Bios: Rosemary Le is a Ph.D. student in Psychology. (Her co-authors are Nathan Witthoft, a post-doctoral research associate; Michal Ben-Shachar, a senior lecturer of English literature and linguistics at Bar-Ilan University; and Brian Wandell, Professor of Psychology and, by courtesy, of Electrical Engineering, Ophthalmology, and Radiology.)
Title: A Wide-Field-of-View Monocentric Light Field Camera
Authors: Donald G. Dansereau, Glenn Schuster, Joseph Ford, Gordon Wetzstein
Abstract: Light field (LF) capture and processing are important in an expanding range of computer vision applications, offering rich textural and depth information and simplification of conventionally complex tasks. Although LF cameras are commercially available, no existing device offers wide field-of-view (FOV) imaging. This is due in part to the limitations of fisheye lenses, for which a fundamentally constrained entrance pupil diameter severely limits depth sensitivity. In this work we describe a novel, compact optical design that couples a monocentric lens with multiple sensors using microlens arrays, allowing LF capture with an unprecedented FOV. Leveraging capabilities of the LF representation, we propose a novel method for efficiently coupling the spherical lens and planar sensors, replacing expensive and bulky fiber bundles. We construct a single-sensor LF camera prototype, rotating the sensor relative to a fixed main lens to emulate a wide-FOV multi-sensor scenario. Finally, we describe a processing toolchain, including a convenient spherical LF parameterization, and demonstrate depth estimation and post-capture refocus for indoor and outdoor panoramas with 15 x 15 x 1600 x 200 pixels (72 MPix) and a 138-degree FOV.
Bio: Donald Dansereau joined the Stanford Computational Imaging Lab as a postdoctoral scholar in September 2016. His research is focused on computational imaging for robotic vision, and he is the author of the open-source Light Field Toolbox for Matlab. Dr. Dansereau completed B.Sc. and M.Sc. degrees in electrical and computer engineering at the University of Calgary in 2001 and 2004, receiving the Governor General’s Gold Medal for his work in light field processing. His industry experience includes physics engines for video games, computer vision for microchip packaging, and FPGA design for high-throughput automatic test equipment. In 2014 he completed a Ph.D. in plenoptic signal processing at the Australian Centre for Field Robotics, University of Sydney, and in 2015 joined on as a research fellow at the Australian Centre for Robotic Vision at the Queensland University of Technology, Brisbane. Donald’s field work includes marine archaeology on a Bronze Age city in Greece, seamount and hydrothermal vent mapping in the Sea of Crete and Aeolian Arc, habitat monitoring off the coast of Tasmania, and hydrochemistry and wreck exploration in Lake Geneva.
Title: Making Virtual Reality Better than Reality
Authors: Nitish Padmanaban, Robert Konrad, Tal Stramer, Emily Cooper, Gordon Wetzstein
Abstract: From the desktop to the laptop to the mobile device, personal computing platforms evolve over time. Moving forward, wearable computing is widely expected to be integral to consumer electronics and beyond. The primary interface between a wearable computer and a user is often a near-eye display. But current-generation near-eye displays suffer from multiple limitations: they are unable to provide fully natural visual cues and comfortable viewing experiences for all users. At their core, many of the issues with near-eye displays are caused by limitations in conventional optics. Current displays cannot reproduce the changes in focus that accompany natural vision, and they cannot support users with uncorrected refractive errors. With two prototype near-eye displays, we demonstrate how these issues can be overcome using display modes that adapt to the user via computational optics. By employing focus-tunable lenses, mechanically actuated displays, and mobile gaze tracking technology, these displays can be tailored to correct common refractive errors and provide natural focus cues by dynamically updating the system based on where a user looks in a virtual scene. Indeed, the opportunities afforded by recent advances in computational optics open up the possibility of creating a computing platform in which some users may experience better quality vision in the virtual world than in the real one.
Bios: Robert is a 3rd year PhD candidate in the Electrical Engineering Department at Stanford University, advised by Professor Gordon Wetzstein. His research interests lie at the intersection of computational displays and human physiology with a specific focus on virtual and augmented reality systems. He is specifically interested in the vergence-accommodation and visual-vestibular conflicts present in current VR and AR displays. He received his Bachelor’s Degree from the ECE department at the University of Toronto in 2014, and his Master’s Degree from the EE Department at Stanford University in 2016.
Nitish is a second year PhD student in the Stanford Computational Imaging lab. He did his undergraduate degree at UC Berkeley focusing on signal processing, and now works on opto-computational displays for VR. In particular, he has spent the last year working on building and evaluating displays to alleviate the vergence-accommodation conflict (VAC), and is now investigating the role of the vestibular system in causing simulator sickness in VR.
Title: Capsule Ultrasound Device
Authors: Spyridon Baltsavias, Farah Memon, Junyi Wang, Gerard Touma, Morten Rasmussen, Chienliu Chang, Eric Olcott, R. Brooke Jeffrey, Butrus T. Khuri-Yakub, and Amin Arbabian
Abstract: We are developing a swallowable capsule ultrasound (CUS) device to serve as a wireless, disposable, ultrasonic imager for investigating the multiple layers of the complete gastrointestinal (GI) tract. Offering 360 degrees FOV and a 5 cm penetration depth, our device will generate large imaging datasets and algorithmically reconstruct clinically valuable visualizations of the digestive system, surpassing the limitations of existing optical capsule endoscopes and enabling rapid screening for lesions, cancerous tissue, and other diseases. The core components of our device are a conformal capacitive-micromachined ultrasonic transducer (CMUT) array, a low-power front-end electronics chip with control and readout capabilities, and an efficient wireless data transmitter – all of which have been successfully fabricated and are currently being integrated and packaged to develop the proof-of-concept system. We believe this technology could serve as a platform for enabling a multitude of innovative wearable, ingestible, and implantable applications in the context of medical diagnosis, imaging, treatment and monitoring for the nascent IoT world.
Bios: Spyridon Baltsavias is a Ph.D Student in Electrical Engineering, advised by Prof. Amin Arbabian. His primary research interests lie in the interface between electronics and the human body and include wireless power delivery, data telemetry, and sensor design for biomedical devices.
Farah Memon is a Ph.D Student in Bioengineering, advised by Prof. Butrus Khuri-Yakub. Her interests include the development of medical imaging devices, using MEMS-based sensors, for diagnostic and therapeutic applications.
Title: Automated methods for Identification and avoidance of axon bundle activation for Epi-retinal prosthesis
Authors: Nandita Bhaskhar, Karthik Ganesan, Lauren Grosberg, E.J.Chichilnisky, Subhasish Mitra
Abstract: Retinal prosthesis is an example of advanced electro-neural interfaces for treating blindness due to photoreceptor degeneration, which affects tens of millions of people worldwide. Retinal circuitry consists of multiple cell types, with each type forming an orderly lattice (mosaic) and carrying a distinct, temporally precise representation of the visual world forming a native neural code. Current solutions only provide coarse resolution stimulation in a limited field of view, with no regard for the location or natural response properties of individual cells leading to gross activation of RGCs coupled with axon bundle activation. Indiscriminate stimulation of these bundles produces a signal that is known to significantly degrade artificial vision in patients, thus preventing epiretinal prosthesis from advancing. Towards this goal, we have developed automated methods to detect and hence avoid bundle activation. Results obtained using this method match closely with human estimates of axon bundle activation thresholds. We envision this being used in a closed-loop retinal prosthesis system.
Bio: Nandita is a PhD candidate in the EE department at Stanford and a member of the Robust Systems Lab headed by Professor Subhasish Mitra, working in collaboration with Professor E.J.Chichilnisky from the Neurosurgery department at Stanford. She got her B.Tech from Indian Institute of Information Technology, Kancheepuram with Honours in India and her Masters from Stanford University. Her research is on developing implantable Epiretinal Prosthesis that can restore vision through electrical stimulation fully or partially to people blinded due to photo-receptor loss. She believes that brain-machine-interfaces are one of the best uses of technology to improve the quality of life. Apart from her research, she is also highly interested in Sensors, Systems and Circuits and their applications in interdisciplinary fields, travelling creative writing, music, getting lost, hiking, biking and exploring new things.
Title: Saliency in VR: How do people explore virtual reality scenes?
Authors: Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Gordon Wetzstein
Abstract: Understanding how humans explore virtual environments is crucial for many applications, such as developing compression algorithms or designing effective cinematic VR content, as well as to develop predictive computational models. We have recorded 780 head and gaze trajectories from 86 users exploring omni-directional stereo panoramas using virtual reality (VR) head-mounted displays. By analyzing the interplay between visual stimuli, head orientation, and gaze direction, we demonstrate patterns and biases of how people explore these panoramas and we present first steps toward predicting time-dependent saliency. To compare how visual attention and saliency in VR are different from conventional viewing conditions, we have also recorded users observing the same scenes in a desktop setup. Based on this data, we show how to adapt existing saliency predictors to VR, so that insights and tools developed for predicting saliency in desktop scenarios may directly transfer to these immersive applications.
Bio: Vincent Sitzmann is a graduate student in Computer Science working in Prof. Wetzstein’s Computational Imaging and Display Laboratory at Stanford University. He is focusing on computational imaging and displays as well as computer vision.
Title: Realistic Camera Simulation for Machine Learning Applications
Authors: Henryk Blasinski, Trisha Lian, Joyce Farrell and Brian Wandell
Abstract: We are designing a computational environment to simulate many different types of imaging systems, including light field cameras, multispectral imaging systems and surround video. In this poster, I will describe how we can use these computational tools to create a database of camera images for machine learning applications such as autonomous driving. Our simulations define the “ground truth” for 3D spectral radiance scene data in real physical units, including the effects of non-uniform illumination and different atmospheric conditions. We model different types of optical filters and lenses and calculate the digital output from imaging sensors. This makes it possible to explore how hardware design choices affect algorithmic performance.
Bio: Henryk is pursuing his Ph. D. in the Department of Electrical Engineering. His research interests include computational photography, computer vision and machine learning. Henryk’s work includes designing image systems and algorithms, operating in the spectral domain in order to recover wavelength dependent surface reflectance and fluorescence properties. The work in underwater imaging extends these ideas by investigating the influence of scattering and absorbing media on image formation model. Henryk is also a recipient of a Brown Institute Magic Grant, where he applies these tools and techniques to solve a practical problem; analyzing coral reef imagery from consumer cameras.
Title: A Simulation Toolbox for Prototyping Imaging Systems
Authors: Trisha Lian, Henryk Blasinski and Brian Wandell
Abstract: Advances in hardware development have made it possible to develop novel imaging systems, each capturing unique scenes and utilizing unique designs. We introduce a toolbox that can simulate a full imaging system pipeline, starting from a virtual 3D scene and ending with a processed image on a display. Through this simulation, we can vary parameters in order to prototype imaging systems and explore design trade-offs. We can use our system to 1) simulate camera rigs for 360° capture and 2) render through the optics of the human eye
Bio: Trisha Lian is a PhD student in Electrical Engineering working with Professor Brian Wandell. She received her Bachelor’s Degree in Biomedical Engineering at Duke University in 2014.
Title: Cortical areas encoding visual segmentation cues from relative motion and relative disparity
Authors: Peter J. Kohler, Benoit Cottereau & Anthony M. Norcia
Abstract: When confronted with dynamic scenes, the visual system needs to determine which elements of a scene form objects, and which elements belong to the background. To perform this figure-ground segmentation, the visual system relies on a number of cues, including differences in binocular disparity (relative disparity) and motion direction (relative motion). These two cues strongly co-vary at object boundaries and visual areas that encode both cues may combine them to support a more robust representation of objects and surfaces. We used functional MRI to compare responses to the two cues, across human visual cortex. Participants viewed dichoptic displays through anaglyph glasses, in which a central disc-shaped region, defined using either relative disparity or relative motion, periodically appeared and disappeared. Both the relative motion and relative disparity displays were generated using random dots, undergoing oscillatory lateral motion – the only difference was that the motion was in-phase between the eyes for relative motion, and in anti-phase for relative disparity. By measuring the changes in fMRI BOLD signal associated with the appearance and disappearance of the central region, we were able to characterize responses within several topographically organized regions-of-interest (ROIs), defined for each participant (n=15). Because these ROIs cover the majority of visual cortex, we were able to track the sensitivity to the two cues throughout the visual processing hierarchy. Responses to motion cues were seen as early as primary visual cortex, the first cortical area to receive inputs from the two eyes, while reliable disparity responses were not seen until later stages in the hierarchy. Several regions in dorsal visual cortex were found to have strong responses to both cues, which suggest that these regions use cue combination to support segmentation. Our results show that in terms of the sensitivity of the visual brain, segmentation cues from relative motion are at least as important as those from relative disparity, and in fact activate the very earliest cortical regions in the visual processing stream. Visual displays in which both types of cues are available to the user should lead to more optimal figure-ground segmentation and more naturalistic perception.
Bio: Bachelor in Psychology from University of Copenhagen, 2007. PhD in Cognitive Neuroscience from Dartmouth College, 2013. Post-doc at the Stanford Vision and Neuro-development Lab since then. More than 10 years experience with research in Cognitive Neuroscience, specializing in the visual system of neurotypical adults: Designing and executing experiments, programming stimulus displays, project management, data analysis and communication of results. Expert in probing visual perception at multiple stages of cortical processing, using a combination of behavioral, EEG and functional MRI methods. Interested in using insights from Neuroscience to drive real-world innovation.
Title: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Authors: Angela Dai, Angel Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Niessner
Abstract: One of the main challenges of modern machine learning is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available — current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1500 scenes annotated with 3D camera poses, surface reconstructions, semantic segmentations with CAD model placements. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation and CAD model placements. During experiments with the data, we find that it helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, 3D semantic labeling, and CAD model retrieval.
Bio: Angela Dai is PhD candidate in Computer Science at Stanford University. She is supervised by Prof. Pat Hanrahan and working on real-time 3d reconstruction as well as 3d semantic understanding.
Title: Aperture interference light field microscopy for fluorescence volumetric imaging
Authors: Isaac Kauvar, Julie Chang, Gordon Wetzstein
Abstract: We cover analysis of the fundamental diffraction-imposed spatio-angular resolution tradeoff of light field imaging systems, and describe coded-aperture-style acquisition schemes that can computationally reshape this tradeoff. We explore the limits of fluorescence light field microscopy (LFM), where the goal is to reconstruct a volume with high lateral and axial spatial resolution. As microscopy applications are often particularly sensitive to resolution, even more so than with photography, poor resolution has hindered widespread adoption of existing LFM. We present a new design termed the Aperture-interference Light Field (ALF) microscope, and we demonstrate in simulation and with a prototype that significant resolution improvement is possible beyond conventional LFM.
Bio: Isaac Kauvar is a PhD student in Electrical Engineering, advised by Gordon Wetzstein and Karl Deisseroth. Julie Chang is a PhD student in Bioengineering, advised by Gordon Wetzstein.
Title: 3D-R2N2: recurrent reconstruction neural network
Authors: Christopher B. Choy, Danfei Xu, JunYoung Gwak, Silvio Savarese
Abstract: Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework i) outperforms the state-of-theart methods for single view reconstruction, and ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
Bio: Danfei Xu is currently a second year Ph.D. student in the Computer Vision & Geometry Lab at Stanford University, advised by Professor Silvio Savarese. He has broad interests in robotics, computer vision, and applied deep learning. Before joining Stanford, he received his Bachelor’s Degree in Computer Science from Columbia University.
Title: 3D Semantic Parsing of Large-Scale Indoor Spaces
Authors: Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, Silvio Savarese
Abstract: We propose a method for semantic parsing of the 3D point cloud of an entire building using a hierarchical approach: first, the raw data is parsed into semantically meaningful spaces (eg. rooms, etc) that are aligned into a canonical reference coordinate system. Second, the spaces are parsed into their structural and building elements (eg. walls, columns, etc). Performing these with a strong notation of global 3D space is the backbone of our method. The alignment in the first step injects strong 3D priors from the canonical coordinate system into the second step for discovering elements. This allows diverse challenging scenarios as man-made indoor spaces often show recurrent regularities while the appearance features can change drastically. We also argue that identification of structural elements in indoor spaces is essentially a detection problem, rather than segmentation which is commonly used. We evaluated our method on a new dataset of several buildings with a covered area of over 6,000 sq. m. and over 215 million points, demonstrating robust results readily useful for practical applications.
Bio: Iro Armeni is a Ph.D. student at Stanford University, Civil and Environmental Engineering and Computer Science departments. Her area of research is computer vision with a focus on semantic understanding of indoor spaces and buildings. Prior to joining Stanford, she received an MEng in Architecture and Digital Design from the University of Tokyo, an MSc in Computer Science from the Ionian University, and a Diploma in Architectural Engineering from the National Technical University of Athens. Iro is a recipient of Stanford’s Reed Fellowship, EU Marie-Curie Fellowship and Japan’s MEXT Scholarship.
Title: Microanatomical differences predict functional differences in human ventral visual cortex
Authors: Mona Rosenke, Kevin S. Weiner, Michael Barnett, Karl Zilles, Katrin Amunts, Rainer Goebel, Kalanit Grill-Spector
Abstract: The human ventral visual stream consists of several areas considered processing stages essential for perception and recognition. A fundamental microanatomical feature differentiating visual areas is cytoarchitecture, which refers to the distribution, size, and density of cells across cortical layers. Understanding microanatomical differences across areas as well as among individuals is essential to enhance our understanding of the human brain. However, because cytoarchitectonic structure is measured in 20-micron-thick histological slices of postmortem tissue, it is difficult to assess (a) how anatomically consistent these areas are across brains and (b) how they relate to functional parcellation obtained with prevalent neuroimaging methods. The goal of this study was a) to generate a cross-validated and optimized cytoarchitectonic atlas of the human ventral visual stream on a whole brain template that is commonly used in neuroimaging studies, and b) to assess the commonalities between an existing functional brain atlas (Wang er al., 2014) and our cytoarchitectonic atlas of the human ventral temporal cortex. Our results shed light on the anatomical basis for functional processing. This knowledge can be utilized, among other things, to build more biologically plausible computational neural networks for visual recognition, as well as help understanding why models may work well under some conditions and not others.
Bio: Mona is a 2nd year PhD student in the Psychology Department at Stanford University, advised by Professor Kalanit Grill-Spector. Her research focuses on the relationship between anatomy and function of the brain in regions involved in visual recognition. Specifically, she is interested in the question why different functional regions have different underlying cell distributions and investigates this using functional Magnetic Resonance Imaging (fMRI). She received her Bachelor’s and Master’s Degree at Maastricht University in 2012 and 2014 in Psychology and Neuroscience.
Title: ISETBio: A Computational Engine for Modeling the Early Visual System
Authors: James Golden, David Brainard, E.J. Chichilnisky, Fred Rieke, Joyce Farrell, Nicolas Cottaris, Haomiao Jiang, Xiaomao Ding, Ben Heasley, Jonathan Winawer, Brian Wandell
Abstract: ISETBio is a freely available collaborative software resource that captures our current understanding of the physiological optics, phototransduction and retinal processing that shape the visual signals transmitted from the eye to the brain. The first stage of the computational engine captures essential features of physiological imaging and photon capture. Models of the phototransduction process then convert the photon capture to photoreceptor signals that begin the cascade of retinal processing. Finally, models of retinal circuitry and processing convert these photoreceptor signals to output patterns of retinal ganglion cell activity, emulating the signals conveyed from the eye to the brain. This resource is accompanied by a repository of validation data, and is used to generate testable hypotheses about visual physiology and behavior using the computational observer approach. When the computational observer cannot make a reliable discrimination, we expect that the discrimination will be beyond the capability of the human observer as well. This provides a useful guide to prevent unnecessary costs that would be required to build devices to a specification that exceeds the limits of human perception. ISETBio has been used to simulate measurements for psychophysical thresholds, anomalous color vision, and perception with a retinal prosthesis. Ideally, it will evolve into a centralized platform for understanding, sharing, and applying of our knowledge of the early visual pathways (github.com/isetbio).
Bio: James Golden is a postdoctoral scholar in the labs of Brian Wardell and EJ Chichilnisky.
Title: ProxImaL: Efficient Image Optimization using Proximal Algorithms
Authors: Felix Heide, Steven Diamond, Matthias Niessner, Jonathan Ragan-Kelley, Wolfgang Heidrich, Gordon Wetzstein
Abstract: Computational photography systems are becoming increasingly diverse, while computational resources—for example on mobile platforms—are rapidly increasing. As diverse as these camera systems may be, slightly different variants of the underlying image processing tasks, such as demosaicking, deconvolution, denoising, inpainting, image fusion, and alignment, are shared between all of these systems. Formal optimization methods have recently been demonstrated to achieve state-of-the-art quality for many of these applications. Unfortunately, different combinations of natural image priors and optimization algorithms may be optimal for different problems, and implementing and testing each combination is currently a time-consuming and error-prone process. ProxImaL is a domainspecific language and compiler for image optimization problems that makes it easy to experiment with different problem formulations and algorithm choices. The language uses proximal operators as the fundamental building blocks of a variety of linear and nonlinear image formation models and cost functions, advanced image priors, and noise models. The compiler intelligently chooses the best way to translate a problem formulation and choice of optimization algorithm into an efficient solver implementation. In applications to the image processing pipeline, deconvolution in the presence of Poisson-distributed shot noise, and burst denoising, we show that a few lines of ProxImaL code can generate highly efficient solvers that achieve state-of-the-art results. We also show applications to the nonlinear and nonconvex problem of phase retrieval.
Bio: Felix Heide is a postdoctoral scholar at Stanford with Gordon Wetzstein. He studies computational imaging and vision systems using large-scale optimization. Before joining Gordon’s lab, he was a PhD student in Wolfgang Heidrich’s group at the University of British Columbia and at KAUST. He is closely working with a startup (Algolux) to commercially realize some of their ideas on optimization for imaging in mobile devices.