Taxonomy is easy. You have a list of concepts, terms and you put this under that. Often, even if not completely explicit, the reason for putting this “under” that is intuitive:
Me: Does Algebra “go under” [become a Narrower Term of] Mathematics?
Me: Yes.
Blogpost Taxonomy
Mathematics
-Algebra
This seems obvious; we mean something like: “Is all Algebra Mathematics?” Things can, however, quickly get more complicated:
Me: Is a Collie a Dog?
Me: Yes (as a Collie is a breed of Dogs; all Collies are Dogs).
Me: Is Lassie a Dog?
Me: Yes (or, was; Lassie [the concept of Lassie] is still an instance of a Dog).
Me: So…Dead dogs are also Dogs?
Me: Um. Yes.
Me: Huh.
Blogpost Taxonomy
Dogs
– Collies
– Dead dogs
– Lassie**
Mathematics
– Algebra
This looks a little strange; those three terms do not look like siblings, or as though they’re in the same “category” of things. It’s clear that they don’t really all have the same [expressed] relation to Dogs, so exactly what does it mean that they all have the same Broader Term? By the book* there are three valid circumstances under which a concept (Term B) can “go under” or become a Narrower Term of another concept (Term A):
- Generic: Term B is in the genre of—a subclass or subtype, such as a sub-discipline (Algebra and Math) or a breed (Collie and Dogs)—Term A;
- Instantive: Term B is an example of an individual instance (Lassie is a Dog) of Term A; or
- Partitive: Term B is a part of Term A: like the engine of a car, the trunk of a tree, or…the leg of a dog.
Therefore, the following is a standards-compliant structure:
Blogpost Taxonomy
Dogs
– Collies
– Dead dogs
– Dog legs
– Lassie
Mathematics
– Algebra
“Broader Term” is clearly doing several kinds of work here, which are not distinguished from one another; the Dog branch of our nascent taxonomy above looks like the taxonomist (in this case: me) is asserting that all four of the Narrower Terms (siblings) shown are, in some sense, “of a kind” when they are clearly not. Ideally, a good vocabulary structure would try to distinguish between these relationships; this seems especially true when vocabularies are being used to power things like machine learning, voice search, and other technologies.
The problem, then, is to clarify the various BT-NT relationships in the Dog branch in the emerging Blogpost Taxonomy. Still going by the book, we have several options.
Option 1: Add intervening terms
We can certainly clean up this branch by interposing some intervening topics; something like:
Blogpost Taxonomy (Dogs branch only)
Dogs
– Canine anatomy
— Dog legs
– Dead dogs
–Lassie***
Dog breeds
– Collies
– – – Lassie***
Famous dogs
– Lassie***
This is effective enough and could be a good solution—particularly if you have lots of other terms that also fit into those second-level categories of the topic Dogs. This is common in many taxonomies, and for many purposes I might advocate this approach; taxonomies constructed this way are friendly for tasks like document indexing and retrieval and site navigation.
This is because, for those purposes, it’s not important that Dogs is a topic and the second-level terms are subtopics, while starting at the third level we start to have things covered by those topics. The interposition of the subcategories smooths over the jarring dissonance of having Lassie and Dog legs as siblings; simultaneously, we maintain the integrity of our all-some, or BT-NT, relationships.
It’s still unsatisfactory on a philosophical level—or if you want to use your taxonomy for the basis of some kind of machine-readable application—that the types of Broader-Narrower relationships are conflated into a single relationship type; one we can’t refine.
Option 2: Facets
We could add some facets to clarify the types of things in our taxonomy: categories for our categories. These can take many forms; the set of permitted facets in a vocabulary effectively comprise a little authority file of their own.
Blogpost Taxonomy: Facets
Breeds of dog
Dead animals****
Instances of dogs
Parts of dogs
Blogpost Taxonomy (Dogs branch only; Hierarchy View)
Dogs
– Collies [Breeds of dog]
– Dead dogs [Dead animals]
– Dog legs [Parts of dogs]]
– Lassie [Famous Dogs]
Blogpost Taxonomy (Dogs branch only; Facet View)
Facet
– Term Broader Term
—————————————————————————–
Breeds of dog
– Collie [BT: Dogs]
Dead animals
– Dead dogs [BT: Dogs]
Famous dogs
– Lassie [BT: Dogs]
Parts of dogs
– Dog legs [BT: Dogs]
Facets are useful and underutilized, I think, as faceting allows a third dimension in relating concepts. Here we’re asserting that not only is Lassie a Dog, Lassie also belongs to the “category” of Famous dog. Again, for some vocabulary use cases this architecture is an excellent choice.†
But isn’t this solution merely recasting the interpolation of terms in Option 1 (without the benefit of tidying up the hierarchy)? The problem—casting terms in relation to one another without a way to really define those relationships—remains. We have plenty of ways to describe and categorize terms; we want the same control over the relationships between them.
Option 3: Subtypes of Broader-Narrower Term Relationships
A seldom-used design (also outlined in Z39.19) takes advantage of the three valid BT-NT relationships mentioned above (Generic, Instantive, Partitive) and provides each with its own designation:
- Broader Term Generic (BTG) – Narrower Term Generic (NTG)
- Broader Term Instantive (BTI) – Narrower Term Instantive (NTI)
- Broader Term Partitive (BTP) – Narrower Term Partitive (NTP)
…essentially allowing three types of Broader (and Narrower terms) to help clarify exactly the ambiguous relationship type we’ve been poking at, like so:
Blogpost Taxonomy [specific BT-NT version]
Dogs
– NTG Collies
– NTG Dead dogs
– NTI Lassie
– NTP Dog legs
Mathematics
– NTG Algebra
Combining this result with Option 1’s interpolations, a somewhat satisfactory result can be obtained:
Blogpost Taxonomy (Dogs branch only)
Dogs
– NTG Canine anatomy
– – NTG Dog legs
– NTG Dead dogs
– – NTI Lassie***
– NTG Dog breeds
– – NTG Collies
– – – NTI Lassie***
NTG Famous dogs
– NTI Lassie***
This is a little cumbersome, but it helps distinguish between the various NT relationships—to a point. (Note also, interestingly, that we’ve lost the NTP-relationship for Dog legs by changing its parent.)
The use of this (three-kinds-of-BT/NT-relationships) architecture is not widespread; it’s hard for machines to know what to do with this structure, and it’s really a half-measure towards what we’ve been interrogating all along: to precisely specify and name the relationships between the concepts in our vocabulary. I theorize that this architecture was essentially superseded by the spread of graphs.
Option 4: From Taxonomy to Graphs
If we conceive of our vocabulary as a graph (which it is; technically and specifically a directed acyclic graph:‡
…with terms as the nodes and their relationships as the edges, we can recast our problem: we want to be able to name the edges.
And so we come to the reason we’re hearing about the “rise of knowledge graphs” in recent books and articles and at conferences: it’s solving a problem in vocabulary architecture that’s becoming increasingly important as the foundation of AI and other technologies (not to mention search).
Essentially, this approach uses ontological RDF structures to model controlled vocabularies (terms as well as their relationships; nodes as well as edges) in a way that is machine-readable and interoperable.
Building graphs is, admittedly, more difficult than building taxonomies; modeling vocabularies using this richer but more complex architecture requires care and precision, which equals time and effort.
But: we get to name the edges.
More on this in Part 2 of this post.
Notes
* ANSI/NISO Z39.19 (2005; rev. 2010)
** Obviously Lassie could also be an NT of Dead dogs and Collies.*** Clearly, I could not resist the polyhierarchy.
**** This taxonomy is shaping up to be rather more morbid than I had planned.
† The Getty vocabularies, e.g., famously, make extensive and complex use of facets.
‡ I am, after all, a taxonomist.