Tandem Vault Blog

Programmatic Tagging


In today’s post, I outline some of the types of tagging that can be done automatically. 

Let’s face it: (almost) no one wants to spend lots of time tagging images, and part of the appeal of photographic communication is to avoid tapping out written descriptions of stuff. As image collections’ growth rates accelerate, Artificial Intelligence tools are becoming more important in classifying images. Taken together, these new tools will be an essential part of creating the semantics of imagery.

Programmatic capabilities

Let’s take a look at some of the capabilities that fall under Artificial Intelligence and computational tagging. Some of these are bundled with each other in a service, and some are freestanding capabilities.

Machine learning – Computers can be trained to do all kinds of visual recognition tasks—from identifying species and reading handwriting, to looking for defects in manufactured items. Machine learning is the broad category encompassing any trainable system. Some systems rely on centralized servers and databases, and some can be run locally on your own computer. 

Machine learning tags typically come with a confidence rating. Sometimes these ratings feel a bit overconfident.

Facial Recognition – One of the primary machine learning capabilities is facial recognition. It’s an obvious need in many different situations, from law enforcement to social media to personal image management. Some services can recognize notable people. Others are designed to be trained to recognize specific people.

Object recognition – There are dozens of commercial services that can look at images and identify what is being pictured. These may be generalized services, able to recognize many types of objects, or they may be very specialized machine learning algorithms trained for specific tasks.  

Situational analysis – Many of the services that can recognize objects can also make some guesses about the situation shown. This is typically a description of the activity, such as swimming or the type of environment, like an airport. 

Aesthetic ranking – Computer vision can do some evaluation of image quality. It can find faces, blinks and smiles. It can also check for color, exposure and composition and make some programmatic ranking assessments.

Emotional analysis – Images can be analyzed to determine if people’s expressions are happy, sad, mad, etc. Some services may also be able to assign an emotion tag to images based upon subject matter, such as adding the keyword “sad” to a photo of a funeral. 

Optical character recognition – OCR refers to the process of reading any letters or numbers that are shown in an image. Of course, this can be quite useful for determining subject matter and content. 

Image matching services – Image matching as a technology is pretty mature, but the services built on image matching are just beginning. Used on the open web, for instance, image matching can tell you about the spread of an idea or meme. It can also help you find duplicate or similar images within your own system, company or library. 

Linked data – As described earlier, there is an unlimited body of knowledge about the people, places and events shown in an image collection—far more than could ever be stuffed into a database. Linking media objects to data stacks will be a key tool to understanding the subject matter of the photo in a programmatic context. 

Data exhaust – I use this term to mean the personal data that you create as you move through the world, which could be used to help understand the meaning and context of an image. Your calendar entries, texts or emails all contain information that is useful for automatically tagging images. There are lots of difficult privacy issues related to this, but it’s the most promising way to automatically attach knowledge specific to the creator to the object.

Natural Language Processing – NLP is the science of decoding language as humans actually use it, rather than by strict dictionary definitions. NLP allows for slang, poor grammar, metaphors and more. It’s what allows you to enter normal human syntax into a Google search and get the right result. It’s what allows a search for “Cool dog photo” to bring back this photo instead of just a dog in the snow. 

Language translation – We’re probably all familiar with the ability to use Google Translate to change a phrase from one language to another. Building language translation into image semantics helps to make it a truly transcultural communication system. 

All of the categories of tagging listed above are available in some form as AI services, which can be used to tag a great number of images very quickly and cheaply. Some of these tags may even be helpful. (Unfortunately, at the moment, a lot of them are either wrong or unhelpful.) There can be quite a bit of slop here.

Machine learning services are attempting to filter out the slop with confidence ratings. Tags can be filtered according to the algorithm’s confidence in each result. While this can be helpful, in my opinion it’s not addressing the more important challenge, which is the integration of human curation with machine learning tools. As you might imagine, this is an issue we’re looking at closely, and we have some promising approaches.

In the next post, we’ll look at the way that all these tagging tools can be brought together to create a more comprehensive way to understand image content programmatically.