Blog / Foundations

Metadata Handling

Peter Krogh

Mon Jul 06 2020

Metadata is essential for describing digital objects. Unfortunately, metadata can be lost, overwritten, stripped, or ignored. It can get lost in the transition from one program to another, from one file format to another, or in the creation of derivative files. The computer industry is working to standardize and protect metadata, but that’s a slow process. And even when progress is made, existing files and software don’t necessarily support the new solutions. At the moment, if you want to preserve your metadata, you’ll want to check how it’s handled whenever you add new programs or practices to your workflow.

Resilience

Preserving your metadata is key to the long-term use and value of the media. We’ll want to make sure that it is not lost, even in the event of the failure of parts of your system.

Use a database to manage your collection

The easiest way to make metadata resilient is to collect it all into a single database (or as few databases as possible) and make sure to maintain good backups of the database. If you are working in a desktop application, you can backup the metadata by duplicating the database. And, of course, you’ll want to make sure you have a copy of this database stored on a different piece of hardware or on a cloud backup so you will still have a copy in the event of drive failure.

Reconnectable

You’ll also need to make sure that your original media files can be easily reconnected with the database in the event of a problem with the database or the media file storage. A pile of unorganized hard drive backups may be very difficult or impossible to reconnect if the primary copy of the files is lost.

The easiest way to ensure reconnection is to make sure your media file backup is an exact duplicate of the primary storage. This can ensure fast and secure restoration of the media and all the associated metadata.

Reconnection can also be made more resilient by the use of unique filenames or other unique identifiers for media files. Unique filenames/identifiers give you more options for reconnection in the event of a serious problem. For instance, unique filenames make it much simpler to use a CSV to transfer metadata from one application to another, and reconnect it to the proper file. If you have hundreds of files named IMG_1234.jpg, it may be impossible to sort them out properly.

Don’t depend on embedded tags for resilience

It’s also possible to use embedded metadata as a tool to create additional resilience. If your database completely fails, any embedded metadata can probably be reindexed and some portion of the database can be reconstructed. However, I would caution that this provides a false sense of security.

The trends in media management are moving away from embedding all information, and instead towards linking data. This includes information in your application that may be difficult or impossible to embed. For instance, your curation processes will probably include sequencing. There are currently no standardized ways to write this to the files’ metadata. This is typically kept in a database or project file.

In the strongest possible terms, I suggest that you primarily use embedded metadata for portability and not for resilience. Spend your effort making sure you have bulletproof backups of your media catalog.

Portability

Embedding is essential for portability. The easiest and most effective way to make the tags portable is to embed them into the file. In order for this to work smoothly, the current metadata should to be embedded in standard fields. Note that while browsing software will always update embedded metadata, catalog/database applications typically only update embedded metadata when you command them to. This means that you may see an updated file in your collection management software which includes all the current keywords, but the original file may have no metadata embedded.

Verify

If you want to make sure the metadata is in the file, the best thing to do it to check. You can open a file in Bridge and look at the File Info and see if it contains the current embedded metadata. You can even see some of the embedded metadata in your OS as shown below.

Your computer’s operating system indexes some of the metadata in a file. On the left, we see the Mac File Info panel, showing keywords, title, instructions and IPTC city created. On the right, we see that Windows 10 shows keywords and rating stars. This is not a comprehensive list of the metadata an OS will index.

Try it for yourself

Here is a file with tags in all IPTC fields supported by Photoshop. You can download it and open it with various applications to see what is visible. You can also search on the tags in your operating system to see which ones are indexed.

Sidecar files for raw images

If your image is a raw file, then the metadata is typically written to a sidecar file. In order for the metadata to travel along with the file, the sidecar needs to be current, and the destination application needs to read and understand metadata sidecars. Again, the best way to know if this is going to happen is to test.