VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird’s eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.

References in zbMATH (referenced in 7 articles )

Showing results 1 to 7 of 7.
Sorted by year (citations)

  1. Guo, Rui; Zhou, Yong; Zhao, Jiaqi; Yao, Rui; Liu, Bing; Zhang, Xunhui: Unsupervised spatial-awareness attention-based and multi-scale domain adaption network for point cloud classification (2021)
  2. Wu, Zhenni; Chen, Hengxin; Fang, Bin; Li, Zihao; Chen, Xinrun: Building pose estimation from the perspective of UAVs based on CNNs (2021)
  3. Liu, Xinhai; Han, Zhizhong; Hong, Fangzhou; Liu, Yu-Shen; Zwicker, Matthias: LRC-net: learning discriminative features on point clouds by encoding local region contexts (2020)
  4. Nicolas Wagner, Ulrich Schwanecke: NeuralQAAD: An Efficient Differentiable Framework for High Resolution Point Cloud Compression (2020) arXiv
  5. Qian Xie, Yu-Kun Lai, Jing Wu, Zhoutao Wang, Yiming Zhang, Kai Xu, Jun Wang: MLCVNet: Multi-Level Context VoteNet for 3D Object Detection (2020) arXiv
  6. Patil A., Malla S., Gang H., Chen Y.-T.: The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes (2019) arXiv
  7. Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov: Scalability in Perception for Autonomous Driving: Waymo Open Dataset (2019) arXiv