Like Daniela, I too am attending the 2008 KMWorld/Enterprise Search Summit West/Taxonomy Boot Camp meta-conference in San Jose. Shortly before lunch, we heard from Gary Szukalski, Vice President, Customer Relations, Autonomy. He spoke about Meaning-Based Computing. I’ve known Gary for a number of years and he did not disappoint – his message becomes more refined each time I see him speak.
Gary spoke of a “major paradigm shift” in the the IT industry. For years, we (IT practitioners and vendors) have been forced, unnaturally, to aggregate, dumb-down, and structure the mess of unstructured data that makes up approximately 80% of an organization’s information assets. Why have we done this? Because that’s how computers work – they need structure. We are moving into a world where we can stop forcing structure onto data as computers will understand the semantics of what they are storing and indexing.
Now, he didn’t say semantic web or semantic technologies. He talked about meaning – how do we teach our machines to disambiguate terms. He gave an Enron example “shred” means destroy paper documents, but also refers to slicing vegetables in the Enron corpus. It also is a snowboarding reference. How does the machine know? This is where Autonomy is heading.
Why would we care? Gary spoke of the December 2007 amendments to the Federal Rules of Civil Procedure. In a nutshell, these amendments made all relevant electronic information admissible in a legal case. There are definite ROI measures to be had for using the right discovery tools to protect organizations from legal troubles. This brought to my mind the Sedona Principles as well – legal guidelines regarding the importance of metadata.
Pan-enterprise search is the new buzzword. Rather than aggregating – federating – sources together, a search tool should now be able to index ALL objects, regardless of file or storage type. Glad to hear a top ES vendor saying that finally!
Now, I was a big Verity customer/user at a prior employer. I gave them a great deal of feedback on their tools. One thing that always gnawed at me, born from my library roots, was that the definitions of the categories and topics that improved search relevance were locked in the tools. My organization defined them, but we couldn’t share them easily. Only the evidence of their existence, by means of better search results and faceted browsing. But the critical thing about “meaning” is that it be shared. In the “shred” example above, I understood fully it’s importance in the Enron context. But my first thought on hearing the word was cooking, while the woman next to me thought of snowboarding. How does an organization use the power of the tool to educate the users of the tool? Who is working on the UI part of this paradigm shift? And who is thinking about the UI in the context of information security? Secure search should provide access at the role, group, organization or public level. Is Autonomy using open standards to minimize efforts at integrating metadata pan-enterprise? For me pan-enterprise is not just behind the firewall, it extends onto the web in the form of corporate messaging and consumer feedback. Are any of the enterprise search vendors using open methods to allow this kind of integration? I’m interested in hearing, as I left the search world behind a couple of years ago, and have drifted towards the outer edges of the space.
This was one of the better presentations this morning, and I hope they post the slides somewhere soon.