The World Service Radio Archive

Published: 1 January 2011

Putting the World Service radio archive online with machine-generated and crowd-sourced metadata.

Project from 2011 - 2014

What we are doing

We built a prototype website containing the whole of the 麻豆约拍 World Service English-language radio archive. We did this by developing algorithms that listen to the radio programmes and create new descriptive metadata automatically and we then provided the ability for people to correct or add to this data. The video above shows the prototype in action.

Why it matters

We want to make it easier to catalogue and cross-reference large video and audio collections such as the 麻豆约拍's archive, and therefore create enjoyable and useful ways to explore our wealth of programmes and discover hidden gems when the archives are made public. To do this we need metadata about these programmes, and often it doesn't exist in a useful form.

Manually tagging programmes with metadata about them is expensive and time-consuming, so we are researching advanced algorithms and machine-learning techniques that can do it automatically. And where these methods aren't good enough, we want to harness the power of data to improve the metadata.

Our Goals

To develop automated methods to create metadata for audio-visual archives where none, or not much, exists
To develop features that encourage people to add to this automated metadata, and to understand if this leads to increased accuracy
To determine if it is acceptable to launch an archive where the metadata hasn't been comprehensively checked by hand
To explore the features required to make such an archive proposition work
To understand what kind of metadata and tags are good and useful

Outcomes

This project started as part of the ABC-IP workstream and is a followup to KiWi, a project aimed at using Amazon Web Services to process the large amount of audio in the World Service archive. Some components of this have been made available on Github: ruby parsers for Wikipedia鈥檚 and boxes and .

As well as giving us the chance to explore Kiwi and cloud processing further, this project resulted in a prototype for the World Service audio archive.

Following this project, 麻豆约拍 World Service worked to transfer many of the programmes into iPlayer, resulting in over 20,000 additional archive programmes available to the public.

How it works

Our starting point is the massive audio archive of the World Service in English, dating back six decades and covering over 70,000 radio programmes, or more than three years' worth of continuous audio. Metadata for this archive is currently sparse or non-existent.

To counter this, we are first using speech-to-text technology to create transcripts, albeit "noisy" ones. We have then built a "semantic tagger" called KiWi, specially designed to work on the "noisy" transcripts, that automatically assigns topics, drawn from , Wikipedia's store of structured data, to the radio programmes.

From this data we have built a prototype website that lets people explore this archive. And while doing so they can approve, correct, or add to this machine-generated metadata to make the whole thing better for all. You can read more on our blog about how we and the site.

Topics

Project Team

Yves Raimond (PhD)
Senior R&D Engineer
Chris Lowis (PhD)
Senior Research Engineer
Pete Warren
Interaction & User Experience Designer
Theo Jones
Creative Director UX
Andrew Nicolaou
User Interface Developer
Michael Smethurst
Development Producer
Tristan Ferne
Lead Producer
Gareth Adams
Software Engineer
Anthony Onumonu
Principal Software Engineer
Tom Nixon
Project R&D Engineer

Project Partners

Metadata specialists

麻豆约拍

Accessibility links

Yves Raimond (PhD)

Chris Lowis (PhD)

Pete Warren

Theo Jones

Andrew Nicolaou

Michael Smethurst

Tristan Ferne

Gareth Adams

Anthony Onumonu

Tom Nixon

Rebuild Page

Useful links

Theme toggler

麻豆约拍