Blog / Foundations, Visually Speaking, Artificial Intelligence

Programmatic Tagging

Peter Krogh

Wed May 06 2020

In today’s post, I outline some of the types of tagging that can be done automatically.

Let’s face it: (almost) no one wants to spend lots of time tagging images, and part of the appeal of photographic communication is to avoid tapping out written descriptions of stuff. As image collections’ growth rates accelerate, Artificial Intelligence tools are becoming more important in classifying images. Taken together, these new tools will be an essential part of creating the semantics of imagery.

Here are some of the programmatic capabilities that fall under Artificial Intelligence and computational tagging:

Machine learning

Computers can be trained to do all kinds of visual recognition tasks—from identifying species and reading handwriting to looking for defects in manufactured items. Machine learning is the broad category encompassing any trainable system. Some systems rely on centralized servers and databases, and some can be run locally on your own computer.

Facial Recognition

One of the primary machine learning capabilities is facial recognition. It’s an obvious need in many different situations, from law enforcement to social media to personal image management. Some services can recognize notable people. Others are designed to be trained to recognize specific people.

Object recognition

Dozens of commercial services can look at images and identify what is being pictured. These may be generalized services, able to recognize many types of objects, or they may be very specialized machine learning algorithms trained for specific tasks.

Situational analysis

Many of the services that can recognize objects can also make some guesses about the situation shown. This is typically a description of the activity, such as swimming, or the type of environment, like an airport.

Aesthetic ranking

Computer vision can do some evaluation of image quality. It can find faces, blinks, and smiles. It can also check for color, exposure, and composition and make some programmatic ranking assessments.

Emotional analysis

Images can be analyzed to determine if people’s expressions are happy, sad, mad, etc. Some services may also be able to assign an emotion tag to images based on subject matter, such as adding the keyword “sad” to a photo of a funeral.

Optical character recognition

OCR refers to the process of reading any letters or numbers that are shown in an image. Of course, this can be quite useful for determining subject matter and content.

Image matching services

Image matching as technology is pretty mature, but the services built on image matching are just beginning. Used on the open web, for instance, image matching can tell you about the spread of an idea or meme. It can also help you find duplicate or similar images within your own system, company, or library.

Linked data

As described earlier, there is an unlimited body of knowledge about the people, places, and events shown in an image collection—far more than could ever be stuffed into a database. Linking media objects to data stacks will be a key tool for understanding the subject matter of the photo in a programmatic context.

Data exhaust

I use this term to mean the personal data that you create as you move through the world, which could be used to help understand the meaning and context of an image. Your calendar entries, texts, or emails all contain information that is useful for automatically tagging images. There are many difficult privacy issues related to this, but it’s the most promising way to attach knowledge specific to the creator to the object automatically.

Natural Language Processing

NLP is the science of decoding language as humans actually use it rather than by strict dictionary definitions. NLP allows for slang, poor grammar, metaphors, and more. It’s what allows you to enter normal human syntax into a Google search and get the right result. It’s what allows a search for “Cool dog photo” to bring back this photo instead of just a dog in the snow.

Language translation

We’re probably all familiar with the ability to use Google Translate to change a phrase from one language to another. Building language translation into image semantics helps to make it a truly transcultural communication system.

All of the categories of tagging listed above are available in some form as AI services, which can be used to tag many images very quickly and cheaply. Some of these tags may even be helpful. (Unfortunately, at the moment, many of them are either wrong or unhelpful.) There can be quite a bit of slop here.

Machine learning services are attempting to filter out the slop with confidence ratings. Tags can be filtered according to the algorithm’s confidence in each result. While this can be helpful, in my opinion, it’s not addressing the more important challenge, which is the integration of human curation with machine learning tools. As you might imagine, this is an issue we’re looking at closely, and we have some promising approaches.

In the next post, we’ll look at the way that all these tagging tools can be brought together to create a more comprehensive way to understand image content programmatically.