Blog / Foundations, Metadata

What can computers tag for?

Peter Krogh
Mon Jul 27 2020

Machine Learning and other AI services can add some useful information to a visual library, but they can only tag for things they “understand”. Some subjects are relatively easy to train a computer to do. Some are very hard, and some are nearly impossible.

The tagging capabilities will be an ever-growing list, and in large part, will be determined by the willingness of people and companies to pay for these services. But, as of this writing, the following categories are becoming pretty common:

  • Objects shown - This was one of the first goals of AI services, and has come a long way. Most computational tagging services can identify common objects, landscapes and other generically identifiable elements.
  • People and activities shown - AI services can usually identify if a person appears in a photo. They typically won’t know who the person is unless it’s a celebrity, or unless the service has been trained for that particular person. Many activities can now be recognized by AI services, running the gamut from sports to work to leisure.
  • Species shown - Not long ago, it was hard for Artificial Intelligence to tell the difference between a cat and a dog. Now, it’s common for services to be able to tell you which breed of cat or dog (as well as many other animals and plants). This is a natural fit for a machine learning project, since plants and animals are a well-categorized training set, and there are a lot of apparent use cases.
  • Place shown - Even when no GPS data is included, some services can identify a location by the visual appearance of a famous building or other landmark.

Here’s an example of a location that Google Cloud Vision was able to recognize. It also gives you the GPS location of the castle shown.

  • Adult content - Many computational tagging services can identify adult content, which is quite useful for automatic filtering. Of course, notions of what constitutes adult content varies greatly by culture.
  • Readable text - Optical Character Recognition has been a staple of AI services since the very beginning. This is now being extended to handwriting recognition. And once information has been turned into text, it’s possible to translate the text into multiple languages.
  • Natural Language Processing - It’s one thing to be able to read text, it’s another thing to understand its meaning. Natural Language Processing (NLP) is the study of the way that we use language. NLP allows us to understand slang and metaphors in addition to strict literal meaning. For example, we can understand the phrase “how much did those shoes set you back?” NLP is important in tagging, but more important in the search process.
  • Sentiment analysis - Tagging systems may be able to add some tags that describe sentiments. One example: it’s getting common for services to categorize facial expressions as being happy, sad or angry. Whether they are correct is another story.
  • Situational analysis - One of the next great leaps in computational tagging will be true machine learning capability for situational analysis. Some of this is straightforward (e.g., “This is a soccer game”). Some is more difficult (e.g., “This is a dangerous situation”). At the moment, a lot of situational analysis is actually rule based (e.g., “Add the keyword ‘vacation’ when you see a photo of a beach”).
  • Celebrities - There is a big market for celebrity photos, and there are excellent training sets. A number of services do this quite well.
  • Trademarks and products - Trademarks are also easy to identify, and there is a ready market for trademark identification. For example, “alert me whenever our trademark shows up in someone’s Instagram feed.”

Google Cloud Vision can identify both Canon and Coca-Cola logos in this photo. However, it does not seem to find Fuji, Philips, JVC or Lowenbrau.

  • Graphic elements - ML services can evaluate images according to nearly any graphic component. This includes shapes and colors in an image. These can be used to find similar images across a single collection or on the web at large. This was an early capability of rule-based AI services, and remains an important goal for both Machine Learning and Deep Learning services.
  • Captions - Some services can create captions from the analysis they make. Currently, these tend to be a bit comical. But as all the capabilities above get better

Trainable services

Most of the tagging listed above can be incorporated into a generic AI tagging service. But some people will want a tagging tool that can identify very specific items. If you want to identify specific people who are not celebrities, you’ll need to train the system to recognize them. This is also required for most product identification services. In these cases, you’ll need an AI system that allows you to provide a set of training images, and allows you to provide feedback for accuracy. These services could make use of rule-based recognition, Machine Learning or Deep Learning, depending on the requirements.

Is Mediagraph right for your organization?

Let’s find out together.

Book Your demo today