What is Computer Vision Anyway?

Thankfully, the bottom-up approach our brains use has proven to be more practical. By applying a series of transformation algorithms to discover edges, imply objects using those edges, find perspective and movement across multiple images, etc. computers can be trained to see things the way our brains were.

Advancements in AI and processing big data has been the key to achieving the level of complex math necessary to do this accurately at scale. The result has brought computer vision light years ahead of where we were just a few years ago, to the point computers can “tag” thousands of objects fairly accurately.


 

Image from Purdue’s E-Lab showing examples of objects that look and behave similarly

Comprehension and beyond

Now that we have a system that can recognize many varieties and behaviors of objects from multiple angles and in many situations, we reach the most difficult problem of teaching computers to comprehend what it sees. Just because computers can correctly identify a banana in all situations doesn’t mean it knows what bananas are, or if they are edible, or that they come from tropical climates.

To be effective good hardware and software require operating systems. For people, the rest of our brain acts as the operating system to connect and understand all of its individual processes – memory, the other 4 senses, attention and focus, and the collective lessons of our experiences.

All connected in ways we barely understand, encoded in a language we can only attempt to comprehend, and living in a network of neurons more interconnected and frustratingly complex than anything else we’ve tried to uncover (except for maybe particle physics and string theory.)

Here is where the leading edges of computer science, general AI, psychology, neuroscience, and philosophy collide. Understanding on a functional level the way our minds work, and replicating those systems in machines.

Right now those siloed systems are producing incredible advances like self-driving cars, facial recognition, and safe and efficient factory robots. Barron’s estimates that by 2021 the value of computer vision for AI will top $3 billion, and continue to grow at a 30% compounded yearly growth rate.

That’s creates a lot of incentive to begin tackling the deeper problems of context and intention. There’s still a long way to go and the most complex problems still lay ahead, but considering the scale of the problem it’s incredible we’ve even gotten this far.

This article was republished with permission from Cortex.