Hierarchy Depth
“Taxonomy” has become the de facto term for most types of controlled vocabularies, including flat lists, authority files, simple hierarchies, or thesauri. Whether we choose to be pedantic about which vocabulary structures are described as taxonomies, I have seen a shift from single enterprise taxonomies to multiple, inter-related flat lists, taxonomies, and thesauri, often bound by ontological rules. Faceted taxonomies, in which several, mutually exclusive top-level categories or separate vocabularies are used as a single enterprise taxonomy, have been in play for years. However, stitching together various, independent vocabularies of different types and only using those vocabularies which suit the use case seems to be the direction in which people are developing vocabularies.
A common question in modeling taxonomies (of all types) is how deep should the taxonomy be? How many hierarchical levels is the right amount? The honest answer is as deep or shallow as you need it to be. That’s not a very satisfying answer, however, and there are real reasons to consider just how deep a taxonomy should be, especially as we move from large, deep taxonomies to more, shallower, inter-related vocabularies.
How Are They Being Used?
The most obvious question to ask when considering taxonomy hierarchy depth is how the taxonomy is being used. An e-commerce navigational taxonomy, for example, should not be so complex that a user can’t navigate to products. On the other hand, the navigational context is typically made clear through the use of breadcrumbs and users can choose to navigate deeply into a taxonomy to find very granular items. While many users find search more effective to find exactly which product he or she is looking for, contextual hierarchical clues can serve to “teach” users how to use the site more effectively. Additionally, e-commerce taxonomies frequently rely on polyhierarchy to allow products to be listed in more than one place so users can enter from multiple starting locations and still arrive at the products they are trying to navigate to.
For auto-categorization, it is generally better to have more specific concepts applied to increase the chance that content can be found. Often, these more granular concepts are found buried deeply in taxonomies. For end users, using several shallower, faceted taxonomies for this use case has several advantages. One, the facets can be aligned to metadata fields, making it easier for users to understand how things are being tagged. For instance, having a field for Products, Services, and Locations with an aligned faceted structure makes it clear which information should go in each field. Two, training and tuning auto-categorization systems can be very involved. As the complexity of the taxonomies increase, the complexity of deciding just which classification rules are triggered also increases. For the sake of those creating taxonomies and their associated classification rules, several shallow taxonomies can make this easier. Finally, when using a hybrid approach to classification, end users will find it challenging to find the concepts they need in deep, complex taxonomies. With facets aligned to metadata fields including shallower structures, users are more likely to tag content accurately.
Without attempting to be exhaustive of all taxonomy use cases, the main considerations are how the taxonomy is being used and who is the audience.
Are They Consumable?
I used to be an advocate of a single, monolithic, faceted enterprise-wide taxonomy. My thinking was that a single structure (and by single structure, I mean either a single vocabulary with different faceted headings or a combination of separate, faceted taxonomies) was easier to govern and less likely to have erroneous duplicate or nearly duplicate concepts.
I also supported having one, single location for each concept rather than using polyhierarchy. In this way, it forced the taxonomist and the user to really consider the best place for a concept and what the concept really meant. If the concept’s definition was ambiguous enough to allow it to be in multiple locations, then the concept was too ambiguous to use consistently. Part of this was also driven by the technology: the taxonomy management system supported polyhierarchy, but the consuming systems wouldn’t. It was easier to put down a blanket policy of not using polyhierarchy at all rather than modify or create workarounds in each downstream platform.
Consuming system integrations is a justifiable argument for creating shallow or flat taxonomies. Because taxonomy management systems are built to manage taxonomies and most other systems are not, or at least don’t include robust capabilities, the ability for these systems to use complicated structures is limited. While a system ideally shouldn’t dictate how a taxonomy is built, they can inform the direction. Populating several independent or dependent flat lists in a consuming system from multiple flat lists in a taxonomy management system is generally less difficult than trying to engineer a hierarchy to fit into consuming lists. Less difficult, but not impossible.
Can You Graph It?
Current technology is shifting to graph databases with a choice of either RDF or property graphs. The great thing about graphs is they are semantically intuitive and scalable to include data from other sources relatively easily and with few data modeling changes.
Taxonomy and ontology management systems have long favored graph databases because of their ability to support W3C standards like RDF, SKOS, and OWL, and because they rely on the strength of relationships.
If you are using graph technology and consuming systems can make use of this back end, then your modeling choices have more options. Because many flat or shallow taxonomies working in conjunction lean more heavily on the relationships between them rather than only the relationships within them, the ability to create highly customized relationships expands the way data can be modeled, making it more semantic and allowing for levels of complexity that aren’t inherent to the taxonomy structure.
Constructing numerous vocabularies and defining the specific relationships between them can become very complex. However, they tend to be more flexible and accommodating to changes and scaling to include new data sources and types. Flexibility in taxonomy models allow for businesses to respond more quickly to change.
Hierarchies in the Age of Relations
My perspective on taxonomy construction has shifted in favor of using more, shallow, faceted taxonomies (and lists, and thesauri, etc.) working from a common ontology model to power end user applications.
Taxonomies have grown in maturity in many organizations, and the complexity of environments in which they operate has grown as well. Or, probably more accurately, the environments haven’t grown in complexity as much as the way organizations are using taxonomies for more use cases across diverse systems has increased. Because taxonomies need to fit more demands across varying platforms and uses, they have had to shift in response, moving from large, less flexible structures to combinations of smaller, more flexible structures.
Another factor informing my change in opinion is the expanding use of graph databases. Relational databases have long been the favored technology, but the ease of use, scalability, and the way graph databases can connect data from existing systems, including relational databases, has created an environment in which taxonomy values, their attributes, and their relationships can be used more effectively. Because relationships are less likely to be lost to consuming systems, modeling can make heavier use of relationships between concepts to surface data to end users. We see the increased use of graphs and relationships in knowledge graphs, both within the organization and in public information spaces like the Internet.
Taxonomy depth is still worth considering as you plan your domain ontology models and one or more vocabularies, but consider how these taxonomies are being used and in what systems to help inform your choices.