Once I decided to look over an augmented reality SDK with an intent for robotics applications, I went to look at Google's ARCore instead of Apple's ARKit for a few reasons. The first is hardware: I have been using Android phones so I have several pieces of ARCore compatible hardware on hand. I also have access to computers that I might be able to draft into Android development duty. In contrast, Apple ARKit development requires MacOS desktop machines and iOS hardware, which is more expensive and rare in my circles.

The second reason was their announcement that ARCore now has a Depth API. Their announcement included two animated GIFs that caught my immediate attention. The first shows that they can generate a depth map, with color corresponding to distance from camera.

[caption id="attachment_21845" align="aligncenter" width="300"]ARCore depth map Image source: Google[/caption]

This is the kind of data I had previously seen from a Xbox 360 Kinect sensor bar, except the Kinect used an infrared beam projector and infrared camera to construct that depth information on top of its RGB camera. In comparison, Google's demo implies that they can derive similar information from just a RGB camera. And given such a depth map, it should be theoretically possible to use it in a similar fashion to a Kinect. Except now the sensor would be far smaller, battery powered, and works in bright sunlight unlike the Kinect.

[caption id="attachment_21844" align="aligncenter" width="235"]ARCore occlusion Image source: Google[/caption]

Here is that data used in ARCore context: letting augmented reality objects be properly occluded by obstacles in the real world. I found this clip comforting because its slight imperfections assured me this is live data of a new technology, and not a Photoshop rendering of what they hope to accomplish.

It's always the first question we need to ask of anything we see on the internet: is it real? The depth map animation isn't detailed enough for me to see if it's too perfect to be true. But the occlusion demo is definitely not too perfect: there are flaws in the object occlusion as the concrete wall moved in and out of the line of sight between us and the animated robot. This is most apparent in the second half of the clip, as the concrete wall retreated we could see bits of stair that should have been covered up by the robot but is still visible because the depth map hadn't caught on yet.

Incomplete occlusion

So this looks nifty, but what was the math magic that made it possible?