Metadata
At this point, we all live in the digital universe. We order our goods online, we conduct most of our business digitally, and we socialize and interact virtually. The internet of things (IOT) is penetrating into our homes in the form of talking smoke detectors and texting washing machines. When an online service goes down, we feel it acutely.
The electronic universe is data, and the definition of this data is metadata. Metadata is most often defined as “data about data”, an answer which would most likely earn you a failing grade on a high school English essay question. While recursive, the meta- prefix more or less forces us to this verbal structure. Wikipedia has a slightly more robust definition taken from Merriam-Webster—data that provides information about other data—and then adds “but not the content of the data, such as the text of a message or the image itself.”
The latter part of the definition makes an important distinction. The label on a food product is a good metaphor (or, perhaps, meta-for) for metadata. What’s written on the label is metadata describing the content of the package, and, moreover, the context of the package. The ingredients described on the outside make up the product itself on the inside while other information, such as packaged date and expiration date, provide information about the product but doesn’t necessarily describe the product. EXP 2022/10/06 isn’t a can of beans, it’s when the contents of the can of bans may no longer be any good. Similarly, we don’t need information on the can of beans which says “can”, though we might need that definition in the product information management system which indicates in which container we are packaging beans.
If you don’t already love metadata, you should, because it drives many of things we take for granted online. I’m in the business of taxonomy, and, in many cases, taxonomies (or other structured vocabularies) are metadata.
I don’t think we need to discuss why we need or whether we should be using metadata, but we should cover why metadata really should be in organized structures like taxonomies and ontologies.
Why Structure Metadata?
The use cases for taxonomies are endless, so the why of building taxonomies here is specifically about their use as metadata.
A navigational taxonomy is one of the instances in which taxonomy nodes are a location. You could also think of it as a container in which you place “stuff”. It’s a place you land, probably with a URL and structured as a web page, with information on the page gathered there in relation to the name of that taxonomy node. If I go to a page by navigating down a tree from Appliances > Large Appliances > Refrigerators > French Door Refrigerators, I expect to find information about French door refrigerators. It’s as if this node of a taxonomy is a bucket into which I’ve thrown pictures of refrigerators, text describing the features, the brands, and a whole host of other information related to the topic. From here, I would expect links to other things, whether they be other refrigerators or related products. Navigational taxonomies have their place and assist users by asserting a path they can follow to get to the information they need.
Usually, when I think of taxonomies as metadata, I no longer think of the term nodes as places or buckets which must be filled with information, I think them as concept labels which can be applied to something else; that something else might be a web page, a business document, a digital asset like a photo or a video, or any other data. It becomes metadata. More specifically, it is descriptive metadata as defined at the same Wikipedia link above. Even better, taxonomy management tools allow us to add attributes to the concept label. Really, each concept becomes a record including the metadata value itself and its own metadata providing more information about the concept. It can have administrative metadata, such as when it was created, by whom, and what status it is in. It can also have structural metadata, such as how to express and render this concept along with its other concepts. And, if the concept is descriptive metadata, it can also have its own descriptive metadata, such as a definition or a scope note. (If taxonomy concepts are metadata and attributes are metadata about concepts, are attributes meta-metadata or just more metadata?).
So, if taxonomies are metadata, why do I need to structure them hierarchically at all? Why not have lists of terms which I can use to describe things? Well, you can, but flat lists have particular use cases. For example, if you want to sort products by color, and you can use the seven basic colors, there’s no reason to build a complex, hierarchical taxonomy. A list of seven terms is fine. However, if you are using metadata to describe humanities articles, an alphabetized list of 56,000 + terms is not very helpful unless you know, every time, exactly what term you are looking for. Which, by the way, is exactly the thesaurus structure I started working in when I first came to the controlled vocabulary world in 2002.
And this is why, from the world of science and library classification, we structure metadata in hierarchies. Moreover, we structure metadata in lists and hierarchies in concert connected by relationships. What do these structures provide?
Metadata in Context
As described above, the printed label of a product describes the product and the product’s context. I just recently had work done on my bathroom. The tiles had names, colors, and material. They also had manufacturing dates and batch numbers so I could choose tiles which were, hopefully, produced on the same day and were therefore the same color to the eye. (As an aside, they weren’t.)
Where metadata really gets powerful is when it has context. Not only can a concept label provide information as metadata, the relationships to the things around it, or not even around it, can provide more information. Taking our tile example, I chose porcelain tiles for their durability. If I wanted context, I’d see it has a parent concept, and that parent could be “tiles” or “materials” or even “tile materials” depending on the overall structure I was building. Again, this concept could have children, such as “polished” or “matte”. Or, these values could be related from another part of the taxonomy or a different structure altogether, with a parent like “treatment” or “surface”. It’s not about the one best way to structure these concepts, it’s about the best way to structure the concepts in concert and with context to convey a much richer meaning.
Where it really gets good is when I can use relationships defined by standards or my own custom relationships. “Porcelain tiles” may have parent and child concepts, and it may also have related concepts. The power of relationships in the world of controlled vocabularies is when you can define your own to provide even richer context. It is possible to do things like
Porcelain tiles has color White
Porcelain tiles has brand Brand name
Porcelain tiles has location Store location
And on and on. Once those relationships are established, I can now do things like find all tiles or all material types in the same color or find all tiles by a particular brand or determine whether that brand of tiles is available at my nearby store. Moreover, I can connect the product description to product specification metadata which is not stored in a taxonomy but lives in a product information management system and provides things like dimensions and manufacturing locations. I can connect the concept “porcelain tiles” to images of porcelain tiles, each of which has its own image metadata.
Relationships provide important context and expand that context in a way that flat lists of terms with no ability to include attributes or relationships can.
Another value is inference. If I know a user’s search included porcelain tiles and they clicked on a location near them, I can infer that user lives in or near that location and send coupons to that store location or provide information about related services, such as contractors, in the same zip code.
Not All Metadata Are Taxonomies
So far, I’ve covered taxonomies as metadata, particularly when they are descriptive metadata. How do we know what should be in a taxonomy? Often, this is as simple as common sense. I would never have administrative metadata like “Date Created” in a taxonomy with all possible creation dates listed in a hierarchy grouped by year. Similarly, I would not attempt to maintain lists of employees and their permissions within a taxonomy.
What does or does not go into a taxonomy can be a matter of scale, like all possible dates, or of frequency of change, such as all employees. There are databases and software to manage such information, and this data do not belong in taxonomies. Data which is finite or does not experience a lot of volatility are often good candidates for taxonomy values. While employee names may change in number and value frequently, creating a name authority of living and deceased authors does not change as often. Deceased author information will rarely change and living author information may require updates to a textual description like a biography or death date.
Legal metadata, such as a copyright holder or information dictating records retention schedules may be good candidates for taxonomies because they benefit from context and relationships. They also change, but not often or all at once.
If you think taxonomies are mainly structures born out of navigational guidance, revisit them with the idea that they are metadata which has been structured to provide context and an interconnected web of relationships providing power to applications of all types.