The 2016 edition of the Computer Vision for Road Scene Understanding and Autonomous Driving will be held in conjunction with ECCV 2016.
- Andreas Geiger, (MPI, Germany) Title: Towards Holistic 3D Scene Understanding for Autonomous Driving. Abstract: Recent progress in self-driving vehicles makes us believe that only a few decades from now drivers can be partly or even fully replaced by autonomous systems which excel humans in terms of availability, response time and field of view. However, fully autonomous driving places very high demands on the robustness and precision of the perception component. To overcome these challenges, rich prior knowledge (e.g., semantics, geometry, physics) must be incorporated into existing models and low-level, mid-level and high-level vision tasks need to be solved jointly. In this talk, I will present some of our recent work towards these goals. I will start with a discussion of our model for 3D traffic scene understanding which reasons jointly about the street layout as well as dynamic objects in the scene. Next, I will focus on several subproblems, including omni-directional vision for intelligent vehicles, online multi-target tracking with bounded memory and computation, map-based vehicle self-localization, depth estimation using object knowledge, volumetric reconstruction, as well as rigidity constraints for 3D motion estimation. Finally, I will present the current state of our KITTI vision benchmark suite and our ongoing efforts on large-scale semantic instance level scene labeling.
- Antonio M Lopez, (Computer Vision Center, Spain) Title:Learning to See in Virtual Worlds Abstract: Most promising vision-based object detection and semantic segmentation methods rely on classifiers trained with annotated samples. However, the annotation step is a human intensive and subjective task worth to be minimized. By using virtual worlds we can automatically obtain large amounts of precise and rich annotations. Thus, we face the question: can a visual model learnt in realistic virtual worlds operate successfully in real world images? Conducted experiments in some specific visual tasks show that virtual-world based training can provide excellent testing accuracy for some real-world datasets; however, it also appears the dataset shift problem for many others. Accordingly, during the last years we have explored different domain adaptation ideas for several state-of-the-art object detection and semantic segmentation approaches, especially for pedestrian detection and urban semantic segmentation. In this talk, we review this work.
- Rudolf Mester, (Visual Sensorics and Information Processing, Goethe University, Germany) Title:Towards visual surround sensing: challenges, test data, and emerging methods. Abstract: The talk presents work of the VSI group (Frankfurt) and the CVL group (Linköping) in the area of Visual (Surround) Sensing for cars, emphasizing methods for measuring visual motion reliably, and extracting 3D information from sets of trajectories over multiple frames. The algorithms presented here are characterized by a strong predictive / recursive character of the processing pipeline, and they involve stochastic models of vehicle dynamics, such as presented in the companion paper [Bradler et al., CVRSUAD 2015]. We show examples of diverse new test data sets, both in a multi-monocular surround view mode (AMUSE data set) as well as very realistic synthetic sequences (COnGRATS) that include precise pixelwise ground truth for 3D depth, optical flow, surface orientation, and semantic labeling. We conclude with an examples of how the diverse variants of the investigated environment perception methods (monocular, stereo, and multi-monocular) perform on real driving scene data.
- Ashesh Jain, (Department of Computer Science, Cornell University, USA) Title: Deep Learning for Sensor Rich Spatio-Temporal Problems: On Cars, Humans, and Robots Abstract: Deep learning applications as we see today, such as image labeling, captioning, machine translation etc., are largely motivated by the 'web' or the 'internet' use cases. Modeling high-level reasoning's with deep neural networks is still something we don't understand quite well. On the other hand, the real-world around us involves complex interactions that span over space and time, and efficiently modeling them is core to enable robots, self-driving cars in traffic conditions, or handling a variety of context-rich spatio-temporal applications. In this talk, I will first address a sensor-rich spatio-temporal application of anticipating maneuvers several seconds before they happen. I will present a sensory-fusion deep learning architecture for this problem. I will then present a generic framework for combining the high-level reasoning's expressed over spatio-temporal graphs with the learning success of deep recurrent neural networks. I will show that the same framework can be applied to cars, to model human motions, and to robots for understanding human-object interactions.
NVIDIA sponsored the best paper award with a TitanX board to:
- Direct Visual Localisation and Calibration for Road Vehicles in Changing City Environments, Geoffrey Pascoe (Univ. of Oxford), William Maddern ( Univ. of Oxford), Paul Newman
|8:30 – 9:30||KeyNote|
|Andreas Geiger, (MPI, Germany)
Towards Holistic 3D Scene Understanding for Autonomous Driving.
|9:30 – 10:00||Oral Session 1|
|Position Interpolation using Feature Point Scale for Decimeter Visual Localization
David Wong,Daisuke Deguchi, Ichiro Ide, Hiroshi Murase (Nagoya Univ.).
|Direct Visual Localisation and Calibration for Road Vehicles in Changing City Environments
Geoffrey Pascoe (Univ. of Oxford), William Maddern ( Univ. of Oxford), Paul Newman.
|The Statistics of Driving Sequences - and what we can learn from them.
Henry Bradler (Goethe Univ. Frankfurt), Birthe Wiegand, Rudolf Mester.
|10:00 – 10:30||Coffee Break|
|10:30 – 11:30||KeyNote|
|Antonio M Lopez, (Computer Vision Center, Spain)
Learning to See in Virtual Worlds
|11:30 – 12:10||Oral Session 2|
|Latent Hierarchical Part Based Models for Road Scene Understanding.
Suhas Kashetty Venkateshkumar (Continental), Muralikrishna Sridhar (Continental), Patrick Ott (School of Computing, Univ. of Leeds).
|Semantic Mapping of Large-Scale Outdoor Scenes for Autonomous Off-Road Driving.
Fernando Bernuy (AMTC Universidad de Chile), Javier Ruiz del Solar (AMTC Univ. de Chile).
|Sequential Score Adaptation with Extreme Value Theory for Robust Railway Track Inspection.
Xavier Gibert (Univ. of Maryland), Vishal Patel (Rutgers Univ.), Rama Chellappa (Univ. of Maryland).
|Goal-Directed Pedestrian Prediction.
Eike Rehder (Inst. f. Meas.- and Control S.), Horst Kloeden (BMW Forschung und Technik GmbH).
|12:10 – 14:00||Lunch Break|
|14:00 – 15:00||KeyNote|
|Rudolf Mester, (Visual Sensorics and Information Processing, Goethe University, Germany)
Towards visual surround sensing: challenges, test data, and emerging methods
|15:00 – 15:30||Coffee Break|
|15:30 – 16:30||Ashesh Jain, (Department of Computer Science at Cornell University, US)
Deep Learning for Sensor Rich Spatio-Temporal Problems: On Cars, Humans, and Robots
|16:30 – --||Poster session|
Topics of Interest
Analyzing road scenes using cameras could have a crucial impact in many domains, such as autonomous driving, advanced driver assistance systems (ADAS), personal navigation, mapping of large scale environments, and road maintenance. For instance, vehicle infrastructure, signage, and rules of the road have been designed to be interpreted fully by visual inspection. As the field of computer vision becomes increasingly mature, practical solutions to many of these tasks are now within reach. Nonetheless, there still seems to exist a wide gap between what is needed by the automotive industry and what is currently possible using computer vision techniques. The goal of this workshop is to allow researchers in the fields of road scene understanding and autonomous driving to present their progress and discuss novel ideas that will shape the future of this area. In particular, we would like this workshop to bridge the large gap between the community that develops novel theoretical approaches for road scene understanding and the community that builds working real-life systems performing in real-world conditions. To this end, we encourage submissions of original and unpublished work in the area of vision-based road scene understanding. The topics of interest include (but are not limited to):
- Prediction and modeling of road scenes and scenarios
- Semantic labeling, object detection and recognition in road scenes
- Dynamic 3D reconstruction, SLAM and ego-motion estimation
- Visual feature extraction, classification and tracking
- Processing for prosthetic (bionic) vision and low-vision assistive devices
- Design and development of robust and real-time architectures
- Use of emerging sensors (e.g., multispectral, RGB-D, LIDAR and LADAR)
- Fusion of RGB imagery with other sensing modalities
- Interdisciplinary contributions across computer vision, optics, robotics and other related fields.
We encourage researchers to submit not only theoretical contributions, but also work more focused on applications. Each paper will receive 3 double blind reviews, which will be moderated by the workshop chairs.
- Submission Deadline: September 25th (Extended!).
- Notification of Acceptance: October 10.
- Camera-ready Deadline: October 15.
- Workshop: December 12.
- Mathieu Salzmann (EPFL, Switzerland)
- Lars Petersson (NICTA, Australia)
- Jose Alvarez (NICTA, Australia)
- Alejandro Gonzalez Alzate, Computer Vision Center, Spain
- Amaury Dame, EPFL, Switzerland
- Andrea Fossati, ETH, Switzerland
- Andreas Geiger, Max Plank Institute, Germany
- Angel Sappa, Computer Vision Center, Spain
- Aura Hernandez, Computer Vision Center, Spain
- Carlos Fernandez, UAH, Spain
- Carlos Becker, EPFL, Switzerland
- Cristiano Premebida, University of Coimbra, Portugal
- David Vazquez, Computer Vision Center, Spain
- Eduard Trulls, EPFL, Switzerland
- Ferran Diego, Uni-Heidelberg, Germany
- Gary Overett, SYSU-CMU Joint Institute of Engineering, China
- Kwang Yi, EPFL, Switzerland
- M\E5rten Bj\F6rkman, KTH, Sweden
- Pablo Marquez Neila, EPFL, Switzerland
- Stephen Gould, ANU, Australia
- Subarna Tripathi, UCSD, USA
- Xuming He, NICTA, Australia
- All papers must be written in English and submitted in PDF format.
- Papers must be submitted online through the CMT submission system. The submission site is: https://cmt2.research.microsoft.com/CVRSUAD2015.
- The maximum paper length is 8 pages. Note that shorter submissions are also welcome. The workshop paper format guidelines are the same as the Main Conference papers.
- Submissions will be rejected without review if they: contain more than 8 pages, violate the double-blind policy or violate the dual-submission policy. The author kit provides a LaTeX2e template for submissions, and an example paper to demonstrate the format. Please refer to this example for detailed formatting instructions.
- A paper ID will be allocated to you during submission. Please replace the asterisks in the example paper with your paper's own ID before uploading your file. More detailed instructions can be found at the main conference website.
Papers should describe original and unpublished work about the above or closely related topics. Each paper will receive double blind reviews, moderated by the workshop chairs. Authors should take into account the following: