Taxonomy Lifecycles
I’ve gotten many questions lately from both current and prospective customers about taxonomy lifecycle management as part of the taxonomy governance process. The questions pertain to both the lifecycle of individual concepts as well as to the overall versioning and archiving of entire vocabularies.
While each organization is different, some fundamental requirements and basic governance practices are common when managing taxonomies and ontologies.
Concept Lifecycle Management
Controlled vocabulary concepts have a lifecycle. Like any living thing, they are born, they age, they make horrible mistakes and embarrassing gaffes, they fall in love with and merge with or are subsumed by other concepts, they grow old and retire, and many die of old age. So it goes.
When taxonomists tell you a taxonomy is never done, it is because concepts are constantly being added, edited, merged, retired, deleted, and are changing states through a concept lifecycle. It’s not enough to simply let a concept be born: it must be active and alive to be of any real use. In most taxonomy management software, a concept can be managed throughout its lifecycle in several statuses, usually something like Candidate, Pending, Approved, Published, Unapproved or Withdrawn, and Deleted. The terms and the number of different states may vary, but the idea is that a concept needs to go through various states in a lifecycle workflow pending human review and confirmation. The move through the lifecycle can be manual or automatic.
Even after a concept is approved, it may need to go through a label change for many reasons including a misspelling or a change in the accepted parlance. In addition to being edited, two different concepts may need to be merged with the preferred label taking over and the old label being subsumed to maintain the link to the old or tagged concept. Some terms must be retired because they no longer serve an active purpose. Other terms must be deleted completely. The number of form changes and state changes can make for an active and dynamic vocabulary.
In association with state changes, there will be an accompanying activity or audit log which will note actions like what was changed, when it was changed, and by whom. Concept information is critical for taxonomists, as it offers additional evidence to change notes as to what happened to a concept and when. Especially in times of taxonomy team expansion or knowledge transfers, concept status and associated audit log information is critical to understanding the state of the taxonomy as a whole.
Taxonomy and Ontology Version Management
Since active vocabularies are in a constant state of development (or should be!), what reasons do we have to version or archive the vocabulary?
One reason is the practical need for a backup should there be any catastrophic failures in the technology stack. Most organizations have redundancies built into their backup procedures and can restore systems and content, but there is always the chance that content is lost and not retrievable. In addition, it’s possible that a controlled vocabulary structure is not captured in backups in such a way they can be easily restored to their original form even if they are retrievable.
Another reason is organizational memory. Most taxonomy management systems have audit trails and change logs, but it’s often quite difficult to conceptualize something like restructuring entire taxonomy branches versus looking at a single concept and its history. This becomes more apparent with real-world use cases. For example, I once worked for a client whose taxonomy drove the front-end website navigation. They were proposing significant changes to the website structure in order to drive more traffic. Not a bad idea, but their proposal for the new structure was nearly identical to what their website navigation structure had been several years ago before being changed. Were they moving forward or reinventing the wheel…or spinning it until it came back to exactly where it had been? Having a historical record would help to guide decision-making in a situation like this.
Finally, what happens when the taxonomy management platform changes? Politics and budgets may drive the decision to move from a spreadsheet taxonomy to a taxonomy management system to another taxonomy management system. In the worst case scenario, an organization may have their taxonomy in production and the system or project could lose funding and be pulled. If the system goes, the taxonomy should not go with it.
So, how do you manage taxonomy versions? There are many options depending on the system.
One way is to replicate the taxonomy or ontology and save it somewhere. For some, this is literally making a copy of a spreadsheet file. For others, this may be using a taxonomy replication feature within a taxonomy management system and storing this copy in the system or in a designated location. As any one in real estate can tell you, location matters. If a taxonomist stores backup or archived taxonomy versions on his or her local machine, what happens if that person moves on or their computer is stolen? Bye-bye data. Similarly, files have a nasty habit of vanishing into file structures or content management systems which are only accessible to a single user or IT administrator. That IT administrator won’t necessarily know what the content is and, as before, if there is taxonomist turnover, the location and access go out the door.
If the taxonomy management system is integrated into another system, such as a content or digital asset management system, the integration may provide the backup. Frequently, this version is only the last version to be output from the taxonomy management system, but if all else fails, the consuming system may be another location from which to retrieve the full taxonomy in case of major failures or system sunsetting.
Reporting is a good option if the taxonomy management system supports it. The ability to create a report of the full taxonomy structure and have this automatically output on a given date (the first of the month, for instance) and delivered to a documented and accessible location takes the manual burden off the taxonomy owners. The reporting process can be established as part of the overall taxonomy governance process and documented in a master taxonomy governance document.
Good for Business
I can’t speak for all taxonomists, but if they’re anything like me, they are neat freaks and hoarders. A well-designed taxonomy is often cared for like a work of art and the creator of that work will want to make sure it is protected and its history is traceable. It’s as important to know where the business has been as where it is going.
More importantly, mission-critical processes may depend on the taxonomy for content tagging and pushes in a production environment. If systems fail, the taxonomy still has work to do and must be managed in a way that it can be recreated and restored.