I Do Not (Yet?) Meet The Prerequisites For Multiple View Geometry in Computer Vision

Python may not be required for performing computer vision with or without OpenCV, but it does make exploration easier. There are unfortunately limits to the magic of Python, contrary to glowing reviews humorous or serious. An active area of research that is still very challenging is extracting world geometry from an image, something very important for robots that wish to understand their surroundings for navigation.

My understanding of computer vision says the image segmentation is very close to an answer here, and while it is useful for robotic navigation applications such as autonomous vehicles, it is not quite the whole picture. In the example image, pixels are assigned to a nearby car, but such assignment doesn't tell us how big that car is or how far away it is. For a robot to successfully navigate that situation, it doesn't even really need to know if a certain blob of pixels correspond to a car. It just needs to know there's an object, and it needs to know the movement of that object to avoid colliding with it.

For that information, most of today's robots use an active sensor of some sort. Expensive LIDAR for self driving cars capable of highway speeds, repurposed gaming peripherals for indoor hobby robot projects. But those active sensors each have their own limitations. For the Kinect sensor I had experimented with, the limitation were that it had a very limited range and it only worked indoors. Ideally I would want something using passive sensors like stereoscopic cameras to extract world geometry much as humans do with our eyes.

I did a bit of research to figure out where I might get started to learn about the foundations of this field, following citations. One hit that came up frequently is the text Multiple View Geometry in Computer Vision (*) I found the web page for this book, where I was able to download a few sample chapters. These sample chapters were enough for me to decide I do not (yet) meet the prerequisites for this class. Having a robot make sense of the world via multiple cameras and computer vision is going to take a lot more work than telling Python to import vision.

Given the prerequisites, it looks pretty unlikely I will do this kind of work myself. (Or more accurately, I'm not willing to dedicate the amount of study I'd need to do so.) But that doesn't mean it's out of reach, it just means I have to find some related previous work to leverage. "Understand the environment seen by a camera" is a desire that applies to more than just robotics.

(*) Disclosure: As an Amazon Associate I earn from qualifying purchases.