Synaptica talks with Kathryn Stickley, Archivist, BBC Archives. BBC Archives has a remit to archive, preserve and store output with historical, cultural and production reuse value. Kathryn is part of the Digital Archive Services team and gives an insight into how the BBC Archives went about moving from a 50-year-old thesaurus to a taxonomy.
Tell us about your organisation and role
Stickley: I have worked with the BBC for over 35 years, joining in 1980. Initially, I was part of the Cataloguing team. Recently I moved to a new role joining the Digital Archive Services. My work is focused on descriptive metadata, managing reference data sets including taxonomy.
Digital Archive Services supports all the BBC Archive. Archivists sit in different areas including TV and Radio and there are quite a few of us. We have a variety of systems across the BBC which hold data and information. Across these archives we use a mix of taxonomy and keywords. Our main role is to ensure metadata content is in the right place, stored and shared regardless of its format.
Why did you need a taxonomy management system?
Stickley: Over the years we have introduced and used a variety of different archive systems, plus their uses have evolved. Classification has been fundamental to our catalogue since the 1930s when we used keywords. During the 1960s a new system based on UDC was introduced, LonClass (short for London Classification). Then thousands of index cards were transferred to a computerised system, Lisi, in the 1980s. Around this time the BBC also had an inhouse computer system, Infax and Lisi was part of this system.
Move forward to 2012 and the introduction of Fabric, which brought further change. Data was migrated from Infax to Fabric including Lonclass with over 350,000 terms. Natural language replaced strings of numbers, but we kept our overall hierarchical structure. Terms were extracted into several small taxonomies plus additional complex terms were absorbed into our renamed taxonomy. To manage these taxonomies, we needed a taxonomy management tool that could be integrated with the Fabric system. We chose Synaptica KMS for this specific role.
Tell me about the selection process?
Stickley: As you can imagine the BBC procurement process is incredibly thorough. Once we decided to obtain a taxonomy management system we liaised with a variety of suppliers. There were many elements which impressed me. Synaptica stood out during their presentation, covering all the points we needed. In addition to this, we had access to a test system and could produce relevant reports straight away, easily and quickly. This was a real bonus for the project. Lonclass and the additional complex terms were morphed into 10 taxonomies (one of which became Subject) by our Data Migration team.
It was then uploaded in Synaptica for integration with Fabric. The 10 taxonomies are all interlinked and using Synaptica KMS makes managing them simple. Long-term we also wanted to be prepared for a federated search. It’s a longstanding aim to have a search portal across the multiple BBC Archives. This would save on having to migrate huge amounts of metadata from one system to another. It’s something we need to be prepared for in the future. The final step in our taxonomy story was to create a smaller but linked taxonomy for the Digital Archive in 2016.
Were there any other major challenges to implementation?
Stickley: The timescale for this project was a major issue and incredibly tight. I had three months to deliver the project and limited manpower, basically me – and this was all done in addition to my day job. The bespoke taxonomy for the Digital Archive was based on the Subject Taxonomy. This meant we could map the two taxonomies together. Terms would be familiar to users and search across old and new archives was feasible. We agreed on several constraints, for example, the number of terms and limitations on hierarchical depth. This way anyone new to taxonomy would find navigating the hierarchy less intimidating.
What features of Synaptica KMS do you like?
Stickley: The reports stand out for me. For specifics Global Editor and Exclusion Reports are great. Many other systems that I use have limited options to perform bulk changes with no opportunity to review. Being able to run detailed reports easily and quickly makes a difference. It did what it said it would do. When I was reducing the taxonomy, the reports were so helpful allowing me to review, amend, eliminate and chip away at the data. Also, undelete option is a major life saviour.
What advice do you have for others developing a taxonomy management project?
Stickley: Come up with a plan before you start: What is the purpose of your taxonomy? Think about how it will be searched and who will be using the taxonomy. Make sure you create your structure to fit your purpose. Also, think about your criteria for terms. In my head, I knew what it was for, e.g. grouping content – looking at subjects and topics in a broad way. It will take time but will be a worthwhile investment.
What do you think are the biggest challenges for your sector for the future?
Stickley: Computer-assisted indexing and automatic classification. It’s something that we’ve been investigating on and off for over ten years; would it work for us? We found associated text with tagging picked up tags that were not really associated. Perhaps CAI works with some organisations like publishing and academia where documents and papers are formally structured and clear as to their subject matter. For the BBC with our broad range of content and media, we seem to get weird results.
Let’s take the BBC One Show. The programme might include a new play, political interview, pop concert or a local news story. A selection of random subjects which are not linked other than on the programme. On the other hand, for content such as radio applying an automatic classification algorithm over the script may not be perfect but, where no other cataloguing is available, can be considered good enough. The cost saving and even time-saving would be beneficial but CAI doesn’t work well for all content types and applications. The BBC archives have been around for over 50 years and will be around for a long time to come. Any savings must be balanced against the long-term impact on search.
Synaptica Insights is a new series of case studies sharing stories and learnings from Synaptica customers, partners, influencers and colleagues.