Category Archives: Blog

  • Library As Platform

    In the next two weeks, our blog posts will focus on the nature and structure of media libraries. This broader context will help inform the choice of the right tool for any particular task. 

    Integration and connectivity are becoming central opportunities and challenges of the modern media ecosystem. The integration of mobile devices and connected services has become essential to the deployment and value of visual media. This is pushing us from a world of single to one of multi: multiple images, multiple access points, multiple devices, multiple sources of images, and multiple outbound connections. The software, services and methods you use to manage your collection need to support (at least some of) this capability. 

    Connectivity between your collection and other entities is fast becoming standard practice. We expect to access our visual media on our phones or over the web. And we need to integrate the media in a collection with other applications like Slack, Dropbox, layout and design software, publication and product management systems, just to name a few.  

    Our understanding of image content is becoming tied to outside services as well, from the social media graph, to Machine Learning tagging, to linked data–what we can know about an image goes beyond the old concept of metadata. 

    Our media libraries are moving from silos to platforms, and connectivity is the key to maximizing the power of that platform. 

    This post is adapted from The DAM Book 3.0 which lays out these principles in comprehensive form.

  • Semantic Image Understanding

    In 1984, Apple unveiled the Macintosh computer, which unleashed a desktop-publishing (DTP) and word-processing revolution. Tools that had previously been used only by a small number of trained professionals were suddenly in the hands of nearly everyone, and soon became essential to many jobs, and to the general functioning in society. Mobile phones are doing the same thing with visual media. 

    It’s hard to imagine, but it took 20 years from the start of the DTP revolution until full drive indexing came to your computer. (You know, that thing you take for granted, where you can type a bit of text, and every document on your computer with that text shows up in a list?) In the interim, there was no good way to file and find specific documents, other than file and folder names. It was clunky, time consuming, and very easy to lose important stuff. 

    We are at a similar point in the development of photographic speech. We’re experiencing a flood of new files to manage, but the tools to store, tag and find are lagging far behind. In large part, this is because we don’t have a good notion of the semantics of images. 


    Semantics is loosely defined as the study of meaning in a language. As we think about speaking the language of imagery, it will be essential to get a more formalized notion of content, context and meaning. This notion needs to factor in a number of the following elements: 

    Denotative elements – This is the who, what, when, where, and why of an image’s subject matter. Many of the mature metadata tools have focused on this, starting with IPTC long before the digital photo revolution. The stock photography industry has also pushed this forward, since there was an economic reason to develop better ways to tag and search vast image collections for sales and licensing. AI tools are now driving this forward. 

    Object graph – In a language spoken with the use of objects, the path, proliferation and connections to the object become a deeply important part of understanding the meaning and importance of the image. 

    Creator knowledge and intent – It is often essential to know the intent of the photographer in order to completely understand the meaning or importance of an image. Was an image captured (and shared) in order to show a specific thing?  Was this a good thing or a bad thing? Visual media can hold a lot of information, and it can be really helpful to know which part the creator intended you to pay attention to. 

    Viewer perspective – You can’t determine meaning without factoring in the relationship between the image and the person viewing. The denotative information and the object graph help to determine if an object has meaning to me. And that meaning may be different than the meaning to others, depending on my personal graph or my cultural perspective.  


    Image semantics falls under the rubric of Informatics: the study of the interaction between people and information systems. Ultimately, we need a way to parse through images in order to find the ones that suit our needs. Sometimes this will be easy. As your needs become more complex, as your collection grows larger, and as you seek to use visual media from other collections, the semantics problem becomes harder and more important. 

    There are several structural methods to approach the discovery issue: 

    Simple search and filtering – The familiar tools we have to search our own collections will continue to be important. If you know the date taken, a simple filter may be the easiest way to find the right image. Search and filter will clearly be improved by computational tagging services, which will help as collections expand. 

    Searching within identity-aware services – When you search with Google, the search is assisted by what Google knows about you. This might be the location you’re in, which helps to find locally-relevant results. Siri and Google know a lot more about you, and can, for instance, make a guess as to whether you mean “horses” or “cars” when you search for “racing.” 

    Intelligent local agents – It’s possible that we will also see some type of intelligent search capability that runs locally in private collections and allows the library owner to know about the person searching, rather than keeping all the information locked away in a social media or giant web service. 

    Image semantics is a young field with a lot of growing to do. While the exact path is uncertain, it’s certain to grow because the problem–and the value of a solution–is growing. Making use of new tools for visual semantics will require the collection, preservation and accessibility of the media.

    Next week, we’ll take a look at the media library ecosystem – what your tools need to accomplish and how to evaluate them. 

  • Programmatic Tagging

    In today’s post, I outline some of the types of tagging that can be done automatically. 

    Let’s face it: (almost) no one wants to spend lots of time tagging images, and part of the appeal of photographic communication is to avoid tapping out written descriptions of stuff. As image collections’ growth rates accelerate, Artificial Intelligence tools are becoming more important in classifying images. Taken together, these new tools will be an essential part of creating the semantics of imagery.

    Programmatic capabilities

    Let’s take a look at some of the capabilities that fall under Artificial Intelligence and computational tagging. Some of these are bundled with each other in a service, and some are freestanding capabilities.

    Machine learning – Computers can be trained to do all kinds of visual recognition tasks—from identifying species and reading handwriting, to looking for defects in manufactured items. Machine learning is the broad category encompassing any trainable system. Some systems rely on centralized servers and databases, and some can be run locally on your own computer. 

    Machine learning tags typically come with a confidence rating. Sometimes these ratings feel a bit overconfident.

    Facial Recognition – One of the primary machine learning capabilities is facial recognition. It’s an obvious need in many different situations, from law enforcement to social media to personal image management. Some services can recognize notable people. Others are designed to be trained to recognize specific people.

    Object recognition – There are dozens of commercial services that can look at images and identify what is being pictured. These may be generalized services, able to recognize many types of objects, or they may be very specialized machine learning algorithms trained for specific tasks.  

    Situational analysis – Many of the services that can recognize objects can also make some guesses about the situation shown. This is typically a description of the activity, such as swimming or the type of environment, like an airport. 

    Aesthetic ranking – Computer vision can do some evaluation of image quality. It can find faces, blinks and smiles. It can also check for color, exposure and composition and make some programmatic ranking assessments.

    Emotional analysis – Images can be analyzed to determine if people’s expressions are happy, sad, mad, etc. Some services may also be able to assign an emotion tag to images based upon subject matter, such as adding the keyword “sad” to a photo of a funeral. 

    Optical character recognition – OCR refers to the process of reading any letters or numbers that are shown in an image. Of course, this can be quite useful for determining subject matter and content. 

    Image matching services – Image matching as a technology is pretty mature, but the services built on image matching are just beginning. Used on the open web, for instance, image matching can tell you about the spread of an idea or meme. It can also help you find duplicate or similar images within your own system, company or library. 

    Linked data – As described earlier, there is an unlimited body of knowledge about the people, places and events shown in an image collection—far more than could ever be stuffed into a database. Linking media objects to data stacks will be a key tool to understanding the subject matter of the photo in a programmatic context. 

    Data exhaust – I use this term to mean the personal data that you create as you move through the world, which could be used to help understand the meaning and context of an image. Your calendar entries, texts or emails all contain information that is useful for automatically tagging images. There are lots of difficult privacy issues related to this, but it’s the most promising way to automatically attach knowledge specific to the creator to the object.

    Natural Language Processing – NLP is the science of decoding language as humans actually use it, rather than by strict dictionary definitions. NLP allows for slang, poor grammar, metaphors and more. It’s what allows you to enter normal human syntax into a Google search and get the right result. It’s what allows a search for “Cool dog photo” to bring back this photo instead of just a dog in the snow. 

    Language translation – We’re probably all familiar with the ability to use Google Translate to change a phrase from one language to another. Building language translation into image semantics helps to make it a truly transcultural communication system. 

    All of the categories of tagging listed above are available in some form as AI services, which can be used to tag a great number of images very quickly and cheaply. Some of these tags may even be helpful. (Unfortunately, at the moment, a lot of them are either wrong or unhelpful.) There can be quite a bit of slop here.

    Machine learning services are attempting to filter out the slop with confidence ratings. Tags can be filtered according to the algorithm’s confidence in each result. While this can be helpful, in my opinion it’s not addressing the more important challenge, which is the integration of human curation with machine learning tools. As you might imagine, this is an issue we’re looking at closely, and we have some promising approaches.

    In the next post, we’ll look at the way that all these tagging tools can be brought together to create a more comprehensive way to understand image content programmatically. 

  • Computational Photography

    This week, we’re going to take a look at some of the frontiers of DAM functionality in the mobile/machine-learning era. We’ll start with this post about computational imaging, and then move into computational tagging and then a discussion of evolving visual semantics. 

    Photography has always had the ability to help us see the unseeable. Early photos showed cityscapes where all the people disappeared as they moved through long exposures. Macro photography and remote cameras allow us to view from perspectives that are not possible with the naked eye. Computational photography is now pushing the boundaries of visual rendering in new and remarkable ways. 

    What is computational photography? 

    Of course, all digital photography is computational. It uses a sensor and a computer instead of film and chemicals to make images. Computational imaging is a subset of digital imaging in which the resulting image could only be created by computers. This is frequently accomplished by combining multiple images together. Computational imaging often makes use of depth information to create the image. Let’s take a look at some examples of computational imaging.

    “Traditional” image processing

    The first six items listed have become standard techniques in many image-processing applications, or even inside the camera’s onboard processor. The output from these techniques is, more or less, a traditional photograph. 

    • High Dynamic Range Images (HDR) combine a series of exposures to capture a range of brightness information that can’t be done with a single photo. They are then blended together to create a finished image that may look “normal” or may have some more painterly effects. 
    • Correcting for lens and focus defects can now be done in post-processing, using complex algorithms to repair the problems. 
    • Stitched panoramas are composite images that combine multiple frames to make a new photo. In this case, they offer a field of view and resolution larger than a single frame can allow. 
    • Multi-lens capture is becoming common on mobile phones and other specialized cameras. It uses lenses with different focal lengths to capture wide- and long-lens photos, for instance, and allow you to create lots of different effects in post-production by blending the images.
    • Focus stacking is a technique where macro photos are shot at multiple focusing distances  and are combined in one frame that has depth of field greater than a single frame can provide. 
    © Russell Brown
    • Alternate representations of geometry Software that is built to represent three-dimensional space is also being used to misrepresent dimensional space. Panorama sequences can be stitched to create  “little worlds” like this one by Russell Brown at Adobe. 

    Rich dimensional data

    The next set of techniques captures depth information and combines it with photographic imagery to build three dimensional models. The resulting creations are accessed with smartphones or computers that allow some level of navigation through three dimensional space. 

    This is a frame grab from a 360-degree video. Two fisheye images are captured to the same frame. They can be stitched together and displayed in multiple ways, including with a phone or headset.
    • 360-degree cameras shoot two (or more) fisheye photos and then stitch them together to make a seamless “bubble” that can be zoomed and spun inside viewing software. 
    • Single-camera depth mapping is a technique where many photos are taken in rapid succession at different focusing distances. These photos are analyzed for in-focus areas, which can be processed to create a three-dimensional map of the scene. This depth information is overlaid on a traditional flat image. 

    Computer-generated or augmented

    And lastly, we get to the computational techniques that move beyond imaging into new digital-native forms. 

    Computer-Generated Imaging (CGI) tools can use source photos and video to make convincing new creations in ways beyond what’s listed above. CGI can also create images entirely inside a computer, without the need for specific source images.  

    Depth-enabled viewing environments are used fully leverage the depth information that is frequently part of computational imaging, you’ll need a depth-enabled viewing environment. They fall into a number of camps: 

    • Augmented Reality (AR) services combine data, drawings, videos or photographs with a mobile camera image. The AR service typically uses GPS location or object recognition to trigger the display. Pokemon Go is an example of AR used in gaming. And Ikea is using AR to help people visualize how the company’s furniture will look in a room. Both Apple and Google offer Software Development Kits (SDK) to create AR applications. This has dramatically reduced the price and time needed to make new applications. 
    • Virtual Reality (VR) typically refers to a 3-D system where the viewer can navigate through scenes, looking around in 360 degrees and moving to new viewing positions. VR systems may use a combination of the rich dimensional and computer-generated techniques listed above to create the virtual environment. VR is usually done with a headset like the Oculus, but it can also be done on a smartphone. Google Cardboard is a low-cost tool to turn your smartphone into a VR viewing device. 

    Just the beginning

    The dimensional flavors outlined above offer some compelling capabilities for both documentation and new art forms. Some uses, like real estate marketing, product retailing or crime scene documentation are already being employed. Many more uses are coming online as industries see the usefulness of reproducing spatial information. And, of course, we see artists using these tools as well, pushing the boundaries of the photographic medium. 

    Computational photography has become a standard part of photographic imaging in the mobile era, and we can expect all of the categories above to continue to grow and merge. Mobile devices are well suited to dimensional image viewing since they have fast processors, position sensors and accelerometers. 

    Most of the computational breakthroughs listed above rely on the combination of many source images into a new creative work. And some of these can be applied retroactively to images that were previously captured. One of the most common features of all these new imaging developments is the use of multiple images to create a new kind of image. This poses a challenge and an opportunity for the photographic collection. The images must be preserved so they’ll be available when the new technologies arrive, they must be accessible, and it must be possible to find them and bring them together into the new tools. A collection that is well-managed and well-annotated will be able to take advantage of new imaging tools in ways that we can’t even conceive of now.

    Okay, so that was a long one. In the next post in this series, we’ll look at the way that computational tagging can help us make images discoverable and better understand the content of a media collection.

  • Data-Rich Objects

    An image is not just a rectangle of colored dots, and as important as the visual content of imagery might be, there is another component becoming even more transformative: the data that lives inside an image file or is associated with an image provides a treasure trove of meaning and context. It is impossible to overstate how important this is in the language of photography. 

    The connected data is often integral to understanding the meaning of an image. Making use of the connectivity that surrounds an image will be an essential part of leveraging all visual media. As such, we need to make sure we understand the types of data that may be connected to a media object.

    Images, particularly those which are hosted in some kind of web service, can have rich connectivity of people and ideas attached

    Let’s take a quick look at the information that may be embedded in an image or connected to it somehow. We’ll start with data created at the time of capture, and then data that can be added as the image moves through the internet. 

    Embedded date and location – Images created with a smartphone will typically include the time of capture and the GPS coordinates of the camera. This can become a key to all kinds of other data.

    Embedded device (and therefore photographer) identifiers – Most images will contain an identifier for the digital camera that made the picture. And it’s a pretty easy step to correlate the device serial number with the person who owns the device (who is typically the photographer). 

    History of sharing and publication – As a person takes action with the photo, additional data is created. The act of selecting and posting to social media services is both an act of curation (e.g., “I want to share this picture”) and an expression of some kind of intent (e.g., “The person in this picture is my friend”). 

    User-created text and tags – The text that accompanies a posting or share of an image tells us even more about the subject of the photo and the intent of the photographer. (Teaser: We think this is an area where we can offer some very cool new tools to help capture intent.)

    AI-created tags – It is increasingly common for images to be processed at some point by Artificial Intelligence services. This may be internal to the social media or DAM service, or it may be user-initiated. This information will continue to increase as new services reprocess old images. 

    Graph – A set of internet relationships can be illustrated by a set of lines that show connectivity. While we usually think of internet graphs to describe relationships between people, graphs can also describe images as they are seen, liked, commented on, and shared. 

    Linked data – All of the above information can also link out to other information (e.g., date and location might pinpoint a known event like a football game). This linkage can come in many forms, including some that may be invisible to you. We should also expect that this linkage will increase as time moves on, as more linkage comes online for people, places, events and media objects. 

    One advantage of cloud-based DAM systems is that they are much better positioned to make use of data connectivity. Cloud-native objects have the opportunity to create and make use of connectivity that is highly impractical for on-premise systems.


    Though there is a lot of data that can be associated with an image, much of this is out of reach (and will become available unevenly). 

    • AI tagging is coming to market quickly and will improve constantly. 
    • Linked data services are less mature. Some of the geodata linkage is already here, like place name lookup. ImageSnippets is an interesting example of linkage between images and  DBpedia entries. 
    • There are some beta services providing graph services for the open web, but they are very new. 
    • Social media graph will remain controlled by the services, and will only be doled out as they find a reason to in their business models. 

    Next week we’ll look at some of the ways rich media and computational imaging are changing the way we communicate. 

    This post is adapted from The DAM Book 3.0 which lays out these principles in comprehensive form.

  • A language spoken with objects

    One characteristic of the new language of imagery is that it is spoken by means of objects. The image—whether still or moving—is a digital object and must be transmitted for the “speech” to take place. This creates some important corollaries:

    Accessible storage – You must have access to the image in order to use it. You can’t paraphrase a photo or video: you either have it or you don’t. Therefore, it’s essential to have a robust, accessible and searchable repository for your images.

    Centralization of images – If you need to bring images in from multiple sources, then the repository needs to accommodate collection from multiple sources. This could be multiple cameras you own (your phone, your DSLR, etc.), or it could require collecting from multiple people.

    Controlled access for others – If you want to enable multiple people to make use of your images, then you’ll need to allow each of them to have access to the repository. Most people don’t want to give everything away with no restrictions, so controlled access is necessary.

    Tag, search and filter capabilities – With the massive multiplication of images, it’s essential to find the right one among the many stored files.

    Rights and permission management – The legal landscape around photos and videos is much more complicated than with textual speech. Each image may have copyright restrictions, and people or property appearing in images may be subject to rights limitations.

    Image quality and transcoding – Textual speech is very resilient; as long as you have a copy of the words in some form, it’s possible to deploy the speech in its highest quality. However, if you mishandle an image file and degrade the quality, there may be no way to recover it. Resizing, reformatting, changing color, or sending through multiple devices or software packages can all damage media when done incorrectly.

    All of the challenges listed above are the same challenges that photographers and videographers faced at the start of the digital revolution. Now that the rest of society is speaking the language of imagery, these are becoming challenges for every type of communication—and they all multiply exponentially at scale.

    Some of the challenges listed above can only be solved in the cloud portion of a DAM system (accessible storage, collection, access). Some of these challenges can only be addressed in the governance level (rights and taxonomy policies), and some must be baked into all parts of the system (taxonomy, rights and object handling).

    This post is adapted from The DAM Book 3.0 which lays out these principles in comprehensive form.

  • Pics or it didn’t happen

    Cameraphones are everywhere, and because of this, we have come to expect nearly every event to be filmed or photographed. We have become accustomed to seeing imagery of anything notable that happens. This has created an expectation that all notable events will be visually documented. When no imagery exists, we doubt the existence of an event. There is an entire generation of adults who came of age in the cameraphone era who will express doubt with the phrase “pics or it didn’t happen”.

    As ever more of our experience is recorded, shared and cataloged, it will become even more important to have visual documentation to describe or prove the existence of an event. As our method to describe becomes more photographic, having access to the images becomes more essential.

    Our reliance on images as “proof” of an event has a downside in the age of convincing visual forgeries. We’re moving far beyond the traditional “Photoshopping” of images and into a world of convincing video fakes. This will make provenance of a piece of media even more important–from a standpoint of trust.

    Pics-or-it-didn’t-happen drives an increase in the importance and velocity of image distribution. In order to meet these expectations, DAM systems will need to be substantially better at gathering imagery from many sources, allowing accurate content and rights tagging, and making the media available for real-time distribution. It’s one of our primary objectives in the upcoming version of Tandem Vault.

  • Mobile>Digital

    When the digital revolution hit the practice of image-making in the early 2000s, it seemed to media professionals like the world had been turned upside down. Cameras looked the same from the front, but everything about shooting, processing and delivering photos and videos changed. The mobile revolution has made the digital revolution look like a small speed bump compared to the seismic changes now happening.

    Mobile makes everyone a photographer/videographer. Digital may have taken film and tapes out of the equation, but cameras were still expensive, and digital workflow was hard and time-consuming. Mobile photography and videography is now ever-present, with cameras in nearly everyone’s pocket and sharing services a click away. With mobile, still and moving images are cheap, plentiful, easy to make, and easy to share.

    We consume imagery differently. We are now more likely to read a web page on a mobile device than on a computer, and each of these is more likely than reading a printed publication. The small screen encourages the use of images instead of text because still and moving images are much more efficient for communication and engagement.

    The mobile ecosystem creates and leverages data richness. Now that we are creating and reading on mobile, it’s easier to attach information to images and to make use of that information. This encourages visual communication that includes a data component, further accelerating the evolution of our photographic language.

    Mobile removes latency. Because mobile images are “born connected,” the time between shooting and sharing has been reduced to a matter of seconds. This has increased engagement, as photos and videos can be shared in real time. Image and video processing apps have reduced the latency between shooting and interpreting the imagery in a personalized or artistic fashion, allowing for a more organic connection between shooting and processing.


    Human beings have been able to “read” of images since the days of our prehistoric ancestors’ parietal art. In more modern times, we have come to understand the meaning of images in a certain way because photographic artists, photojournalists, documentarians and filmmakers have developed the tropes of visual storytelling. In the film era, this was particularly hard to accomplish, as each frame of film cost money to purchase and process. Moreover, knowing that you had the story captured on your unprocessed film was a hard-won skill unto itself.

    In the mobile era, the incremental media cost of taking a photo or shooting a video is basically $0, and the instant feedback that reveals success or failure can take the guesswork out of shooting. In the mobile era, taking a picture or filming a video with proper focus and exposure is within reach of nearly everyone—for free—with equipment that is typically close at hand.


    Mobile photography is increasingly blurring the lines between images, video, text and data. While you can certainly find purist enclaves in mobile communities, the trend is heading inexorably to images as multimedia objects. These can include text overlays, stickers, geodata, audio, image sequences, depth information, augmented reality elements, and more. The smartphone is an inherently data-rich environment, and all that extra stuff adds to the ability to communicate.

    People who build and manage media collections must take the changes wrought by mobile into account as they build and select systems. This changes the velocity of communication, the need to collect imagery from vastly more people, for vastly more use cases, and to erase the boundaries between media types and linkable data. It’s a tough challenge, but we are far enough into the mobile revolution to make some good guesses about what’s going to be important.

  • Images as Language

    We are in a new era of visual communication. The smartphone has enabled visual communication at every level – personal, professional, and institutional. This has had a profound impact on every aspect of still and moving images – from cameras and software, to the legal landscape, use cases, and business models. By understanding imagery as a language, we can make sense of these changes more organically.

    The rise of smartphone-based visual communication does not diminish the value of “traditional” photography and videography. As more people learn to speak these visual languages, they become more useful and important, and a new vernacular emerges—delighting some, horrifying others, and befuddling those who are not open to change.

    Think of how the development of the printing press made books so much more ubiquitous. It allowed more people to join in the creation—wresting control away from those few people who had mastered the old skills. Rather than diminishing the value of books, this popularization lead to an increase in reading and knowledge dissemination; it opened the conversation to all. And as more people were capable of publishing, new voices emerged and new uses and needs were created for the medium. The same is happening today with still and moving imagery.


    Our personal images and videos are our diaries, the expression of our identities, and our memories. They hold great value, and even if they have no monetary value, they are often priceless.

    In a corporate environment, digital imagery is essential to expressing your brand, your history, your products, and the people behind your institution. Photos and videos have also become essential business documents; they can serve as record-keeping and document—or protect from—liability. An organization’s digital media are truly its digital assets—with very real value attached. (Consider the cost to produce a photo or video shoot or your annual stock photography and/or clip-licensing budget.)

    Photos and videos are replacing written observation at an astounding pace, but in most cases, the tools and practices to work with these media are lagging behind the need. The problems that confronted photographers and videographers at the start of the digital age are now spreading to society at large. Yes, there are sharing services that can be effective channels for distribution. But most of these are going to be poor long-term repositories for your asset collection.

    As we examine Digital Asset Management practices, we’ll need to integrate the traditional practices with the increasing role of imagery in all forms of communication. Seeing these forms as part of a new language should help you make sense of what’s happening and prepare for the future.


    DAM services and cloud libraries have traditionally focused their features on communication from marketing and production departments. While those are still the primary drivers of DAM services, the balance is changing, and we need to provide services for these new, much larger use cases. We need to be able to manage the collection, classification and rights-tagging of crowd-sourced media.

    This is a big departure for most DAM systems – in many cases it’s simply been ignored. We are putting this challenge front and center in our product redesign. We feel that the same tools that can make professional visual communication more effective can also be used to power a much broader use case that includes employees, customers, and stakeholders in addition to the traditional constituencies of DAM services.

  • Connected Libraries

    Let’s start the blog posts with an outline of the nature and value of connected media libraries as this will provide some background philosophy that informs what we’re doing at Tandem Vault.

    It’s increasingly important for media libraries to have expanded connectivity. This could be the ability for one person to access the library from multiple devices; or the ability to access the library by multiple people. It could also be the ability to allow other software to make direct use of the media. For corporate and institutional uses, connectivity has become an essential component (and that was before we all started working remotely). Here’s a quick overview of several structural approaches to connectivity:


    • Digital assets on multiple devices and places – We’ve grown accustomed to having access to our digital files from many different places. This has primarily been driven by mobile computing, where we make, consume, and share our visual media.
    • Controlled distribution to others – Distribution has always been an important need for institutional collections. This need has greatly accelerated as the pace of visual communication has increased. Whenever possible, distribution should be done directly from the library in order to capture and leverage usage information.
    • Employee and stakeholder access – Visual media creation and use by employees and stakeholders is expanding rapidly, creating an increased need for access. This requires both the collection and distribution of visual images on a widespread basis.
    • Collecting media from employees and stakeholders – In addition to media access, it’s now commonplace for companies and institutions to gather, centralize, tag and deploy media that is created by employees and other stakeholders.
    • Share with other applications and services – You may also want to share your media collection with other services like social media platforms. While this can be done with a simple export, sharing by a connected export (more on this later) streamlines the process and may allow you to bring valuable information back into the library.
    • Integrate with other systems – If you use a CMS, or one of the many variations of these—like a Product Information Management (PIM) or Building Information Management (BIM)—it may be important to integrate the library directly. This is usually done by an Application Program Interface (API).
    • Tagging services and connected data – Our creation and use of metadata is moving quickly to one that relies on external helpers. This includes computational tagging services and data that lives in external databases.
    • Provide a firewall – It’s become increasingly dangerous for corporate networks and servers to be open to remote or third-party access. The need for access has made this hard to secure. By using a remote library service for media collection and distribution, risks can be reduced for a company’s main data servers.


    There are several structures that are used to connect media libraries to multiple people or devices. Each of these has advantages for some uses, and you’ll find a number of apps and services that utilize multiple methods to enable different kinds of workflow.

    • One application across multiple devices – Connectivity can be managed by a single application. This is a feature of photographer tools like Adobe Photoshop Lightroom. We also see this with file-sharing services like Dropbox, where the installation of the app can manage the distribution of files to others with the same app.
    • A web-accessible library – Another structure for integration is the use of a web-accessible library. A copy of the media can live in a cloud server and be made available to different users. This allows for multi-party access, and in some cases it will allow for multi-party upload to the library.
    • Integration through API – Real integration is typically achieved through the use of an API to enable one app or service to talk to another. This is a very common approach in modern software and services, and powers most of the connectivity.
    • Integration through embedding – It’s also frequently possible for a web-based media library to allow media objects to be embedded in other services.

    This post is adapted from The DAM Book 3.0 which lays out these principles in comprehensive form.