The Unknowns
My last blog (Text Analytics Use Cases – The Knowns) focused on working with knowns in text analytics. More often than not, text analytics is used to expose the unknowns. Or, more precisely, merge the knowns and unknowns to place unknown content into the known framework of the organization or business processes.
Content & Gap Analysis
A fundamental in any information framework building is taking stock of what you have and prioritizing content within your overall information strategy. Even as we strive for sophisticated solutions using ontologies and machine learning, we often lack a basic understanding of what content we own, what business processes that content is part of, or even where the content lives.
A typical problem in an organization is not knowing where content exists. In addition, you may not know whether or not you have access and struggle with findability within the content even if you know where and how to access. These basic content access issues are often a result of poor governance or governance entropy over time as systems change, employees come and go, and business priorities shift. Information, like energy, is subject to entropy, becoming more disordered, random, and losing its impact over time.
Sorting the Archives
A good use case for discovering the unknown using text analytics are file shares which have not been well-governed or whose governance has broken down. Also typical is not having acceptable search solutions on file shares since they are often treated as working locations or archives for old, seemingly irrelevant content which users can’t bring themselves to delete for good. However the file share got to the point of being chaotic and its content unknown, running text analytics over the content can expose what’s inside. Either using a known organizational taxonomy to categorize the content or to use entity recognition and extraction to discover new concepts, the content’s coverage is revealed. Once the unknown is exposed, decisions can be made whether the content should be left as an archive, searched, or discarded. It may be a combination of all three.
Not only can using text analytics processing reveal what’s in the previously unknown content, it can also act as a content gap analysis, showing what useful content the organization has and what useful content does not exist.
Voice of the Customer
A significant use case for marketers is mining the voice of the customer from unstructured and scattered sources like social media posts, reviews, and customer call logs. As long as these are converted into an accessible text stream, the content can be analyzed. Again, an organizational viewpoint can be imposed on the information using a known taxonomy (or, more likely, taxonomies) or new and previously unknown information can be extracted.
While there is still need for customer focus groups, tailored surveys, and other traditional methods of gathering customer needs, expectations, and preferences, the fragmented nature of the online world has created rich bodies of information. The problem, more often than not, is accessing this information and making sense of it. The sheer volume and the incredible velocity of content makes it difficult to analyze even when it is readily accessible. Once the formidable challenges of access and processing have been overcome, text analytics comes into play by analyzing the incoming content, extracting concepts, clustering recurring themes, and feeding the results into actionable information applications.
While actual voice through captured phone conversations and video sources may seem inaccessible, there are numerous tools which convert speech to text if text is the primary input into your text analytics tool.
Sentiment Analysis
Sentiment analysis has improved over the years. Probably the most inaccurate sentiment analysis to date is based simply on dictionaries of positive, neutral, and negative keywords. When matched, the tool applies a sentiment score and returns the results. The main problem with this technique is that a word rarely adheres to a strict definition or use. For instance, being “mad sick” might be a great quality for a snowboard but horrible words used in association with a food product. Likewise, expressing doubts such as “it would have been great if…” doesn’t mean we get to accept the word “great” as positive.
Due to challenges like this, true natural language processing and context need to play into the textual analysis in order to provide more accurate classification, even if the classification remains simply positive, neutral, or negative. Sentiment may be one of the most challenging areas of exposing the unknown, but, like voice of the customer, may have significant impact on product development or improvement.
Typically, a combination of text analysis and human review and categorization can quickly generate enough sentiment indicators to prompt an action that might otherwise take months or even years to expose and address.
Exploiting what’s known in combination with what was previously unknown can create powerful information applications using text analysis with search and other information systems.