Window Shopping Google ARCore: Concepts

I thought Google's ARCore SDK offered interesting capabilities for robots. So even though the SDK team is explicitly not considering robotics applications, I wanted to take a look.

The obvious starting point is ARCore's "Fundamental Concepts" document. Here we can confirm the theory operation is consistent with an application of Structure from Motion algorithms. Out of all the possible type of information that can be extracted via SfM, a subset is exposed to applications using the ARCore SDK.

Under "Environmental Understanding" we see the foundation supporting AR applications: an understanding of the phone's position in the world, and of surfaces that AR objects can interact with. ARCore picks out horizontal surfaces (tables, floor) upon which an AR object can be placed, or vertical surfaces (walls) upon which AR images can be hung like a picture. All other features build on top of this basic foundation, which also feel useful for robotics: most robots only navigate on horizontal surfaces, and try to avoid vertical walls. Knowing where they are relative to current position in the world would help collision detection.

The depth map is a new feature that caught my attention in the first place, used for object occlusion. There is also light estimation, helping to shade objects to fit in with their surroundings. Both of these allow a more realistic rendering of a virtual object in real space. While the depth map has obvious application for collision detection and avoidance more useful than merely detecting vertical wall surfaces. Light estimation isn't obviously useful for a robot, but maybe interesting ideas will pop up later.

In order for users to interact with AR objects, the SDK includes the ability to map the user's touch coordinate in 2D space into the corresponding location in 3D space. I have a vague feeling it might be useful for a robot to know where a particular point in view is in 3D space, but again no immediate application comes to mind.

ARCore also offers "Augmented Images" that can overlay 3D objects on top of 2D markers. One example offered: "for instance, they could point their phone's camera at a movie poster and have a character pop out and enact a scene." I don't see this as a useful capability in a robotics application.

But as interesting as these capabilities are, they are focused on a static snapshot of a single point in time. Things get even more interesting once we are on the move and correlate data across multiple points in space or even more exciting, multiple devices.