Daniela’s latest data visualization post and a conversation with one of the Dow Jones Factiva Project Managers earlier today got me thinking about a presentation I did a couple of years back for the Taxonomy Community of Practice. Seth Earley, TaxoCoP’s founder and chief moderator, had once asked me to talk about my experiences with the Google Search Appliance, and part of my argument was to not simply dismiss it – given the cost, ease of implementation, and quality of results, it may (and does) suit the needs of many organizations. I think he was expecting me to diss it as just another “silver bullet;” I did list the shortcomings it had but all in all could find nothing wrong with our investment. About a year and a half later, he asked me to do another taxonomy & search TaxoCoP call; this time I think he expected me to simply say Google was great! I didn’t – it’s fun to keep Seth on his toes! 😉
Search applications both inside the enterprise and out on the World Wide Web had stagnated while the players talked amongst themselves regarding the “Google problem.” Categorization tools – using proprietary algorithms, rules based coding, and the obligatory WordNet or digital Roget’s had only got us so far. Spell checking, stemming, recommendation engines – meh. Were we seeking a cure for the symptoms, or the disease? Solving the “findability” problem was going to take a new tactic.
Being fairly good at self-criticism, I started wondering if we were asking the right questions. If search breaks, are we dead in the water? Can we get better results if we don’t get better content to index? Isn’t search just an access point – a retrieval method? It’s NOT an operating mechanism for content management. It’s reactive rather than proactive. And the kicker – if search requires entering synonyms, best bets, metadata rules and regular expressions for the best user experience, doesn’t that sound like there’s a grass roots schema and taxonomy in the system already? The processes were flawed.
I have always preferred to have metadata embedded in an object. Think of the title page of a book, the properties of a digital file. If metadata is stored in a repository separate from the object, what happens when the link between them inevitably breaks? Once embedded, the metadata is always available to any agent that wishes to use it.
– Good objects with no metadata (context) are like a Secret Garden.
– Good objects with can’t-be-bothered metadata are like a maze.
– Lots of places to put things creates silos.
The real problems to address are content/digital asset management, UI/UX/UCD concerns and metadata management. Do the systems and processes for managing digital objects promote sound content creation principles? Is the system usable, and aesthetically pleasing? Can the metadata schema and controlled vocabularies be shared by all appropriate systems: file shares, document management, content management, search, databases, etc etc? Consistent application of corporate terms, as well as the ability to add group level annotations are critical.
A good search engine will be even better for the addition of this data at the source of the problem – the content itself. Garbage in, garbage out! If you don’t have a good CMS or MMS (metadata management system) in place, and your team thinks that Search is the answer… ask them to think again. Try using a fishbone diagram or the 5 Why’s technique. You may be surprised at what you uncover.