AI Talks 3: How Computers 'See' Photos

Today, we’ll chop images to create visual vocabularies and explain how computers extract information from photos. So, the technique we’ll learn today is called Bag of Visual Words. It was invented in the early 2000s in the south of France and it rapidly became the standard to which computers were trained.

 

Watch the video explanation if you don't feel like reading!

 

A picture of Taylor Swift, a pair of scissors and a brown bag is what we’ll need to understand how computers represent images in their internal memory.

 
 

Step one is to take the image and to chop it into pieces. We put each piece into a bag, shake them up and create a histogram of the most salient parts of the image. Simply put, we just choose the most distinguishable features. In this case, the face’s most distinguishable features are the mouth, eyes and hair. We repeat the process for more images, for example with a bike and an iPhone.

 
Screenshot (243).png
 

We now have what is called a ‘visual vocabulary’. A visual vocabulary is a compact and representative set of image parts that can be used to represent a lot of pictures. Let’s see how we can use it to represent the new image that you can see below.

 
Screenshot (244).png
 

As you can see, the image depicts a lady riding a bike. As usual, we chop the image into pieces, discard the irrelevant pieces and keep only the pieces that are represented in the visual vocabulary. In this case, we have two pieces related to the face and four related to bike.

The last thing we have to do is to put the right object into the right place in our visual vocabulary.

 
1234567.png
 

And that’s it! Given an image, we have represented it through the visual vocabulary into a very compact and simple format that the computer can understand.

 
Screenshot (251).png
 

So to conclude, today we learnt three things. Firstly, we learnt that the Bag of Visual Words is a powerful technique to classify images. Secondly, we learnt the importance of a visual vocabulary to interpret the visual signals around us. Thirdly, we also learnt how computers use that visual vocabulary to represent an image in their internal memory.

We hope that you enjoyed this blog post and video - don’t forget to subscribe to our YouTube channel.

Find out how to use AI in your market research activities! Schedule a demo with us below.