Learning the depths of moving people by watching frozen people

The goal of this project was to test and further upgrade Google’s Mannequinn challenge project. It implements a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving. At inference time, the method uses motion parallax cues from the static areas of the scenes to guide the depth prediction. The neural network is trained on filtered YouTube videos in which people imitate mannequins, i.e., freeze in elaborate, natural poses, while a hand-held camera tours the scene.