Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks. We present Occlusion-Net, a framework to predict 2D and 3D locations of occluded keypoints for objects, in a largely self-supervised manner. We use an off-the-shelf detector as input (like MaskRCNN) that is trained only on visible key point annotations. This is the only supervision used in this work. A graph encoder network then explicitly classifies invisible edges and a graph decoder network corrects the occluded keypoint locations from the initial detector. Central to this work is a trifocal tensor loss that provides indirect self-supervision for occluded keypoint locations that are visible in other views of the object. The 2D keypoints are then passed into a 3D graph network that estimates the 3D shape and camera pose using the self-supervised re-projection loss. At test time, our approach successfully localizes keypoints in a single view under a diverse set of severe occlusion settings. We demonstrate and evaluate our approach on synthetic CAD data as well as a large image set capturing vehicles at many busy city intersections. As an interesting aside, we compare the accuracy of human labels of invisible keypoints against those obtained from geometric trifocal-tensor loss.

References in zbMATH (referenced in 1 article )

Showing result 1 of 1.
Sorted by year (citations)

  1. Sven Kreiss, Lorenzo Bertoni, Alexandre Alahi: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association (2021) arXiv