Βι¶ΉΤΌΕΔ

The World Service Radio Archive

Unlocking the Βι¶ΉΤΌΕΔ World Service archive

Published: 1 January 2011

Putting the World Service radio archive online with machine-generated and crowd-sourced metadata.

Project from 2011 - 2014

What we are doing

We built a prototype website containing the whole of the Βι¶ΉΤΌΕΔ World Service English-language radio archive. We did this by developing algorithms that listen to the radio programmes and create new descriptive metadata automatically and we then provided the ability for people to correct or add to this data. The video above shows the prototype in action.

Why it matters

We want to make it easier to catalogue and cross-reference large video and audio collections such as the Βι¶ΉΤΌΕΔ's archive, and therefore create enjoyable and useful ways to explore our wealth of programmes and discover hidden gems when the archives are made public. To do this we need metadata about these programmes, and often it doesn't exist in a useful form.

Manually tagging programmes with metadata about them is expensive and time-consuming, so we are researching advanced algorithms and machine-learning techniques that can do it automatically. And where these methods aren't good enough, we want to harness the power of data to improve the metadata.

Our Goals

  • To develop automated methods to create metadata for audio-visual archives where none, or not much, exists
  • To develop features that encourage people to add to this automated metadata, and to understand if this leads to increased accuracy
  • To determine if it is acceptable to launch an archive where the metadata hasn't been comprehensively checked by hand
  • To explore the features required to make such an archive proposition work
  • To understand what kind of metadata and tags are good and useful

Outcomes

This project started as part of the ABC-IP workstream and is a followup to KiWi, a project aimed at using Amazon Web Services to process the large amount of audio in the World Service archive. Some components of this have been made available on Github: ruby parsers for Wikipedia’s and boxes and .

As well as giving us the chance to explore Kiwi and cloud processing further, this project resulted in a prototype for the World Service audio archive.

Following this project, Βι¶ΉΤΌΕΔ World Service worked to transfer many of the programmes into iPlayer, resulting in over 20,000 additional archive programmes available to the public.

How it works

Our starting point is the massive audio archive of the World Service in English, dating back six decades and covering over 70,000 radio programmes, or more than three years' worth of continuous audio. Metadata for this archive is currently sparse or non-existent.

To counter this, we are first using speech-to-text technology to create transcripts, albeit "noisy" ones. We have then built a "semantic tagger" called KiWi, specially designed to work on the "noisy" transcripts, that automatically assigns topics, drawn from , Wikipedia's store of structured data, to the radio programmes.

From this data we have built a prototype website that lets people explore this archive. And while doing so they can approve, correct, or add to this machine-generated metadata to make the whole thing better for all. You can read more on our blog about how we and the site.

Project Team

  • Yves Raimond (PhD)

    Yves Raimond (PhD)

    Senior R&D Engineer
  • Chris Lowis (PhD)

    Chris Lowis (PhD)

    Senior Research Engineer
  • Pete Warren

    Pete Warren

    Interaction & User Experience Designer
  • Theo Jones

    Theo Jones

    Creative Director UX
  • Andrew Nicolaou

    Andrew Nicolaou

    User Interface Developer
  • Michael Smethurst

    Michael Smethurst

    Development Producer
  • Tristan Ferne

    Tristan Ferne

    Lead Producer
  • Gareth Adams

    Gareth Adams

    Software Engineer
  • Anthony Onumonu

    Anthony Onumonu

    Principal Software Engineer
  • Tom Nixon

    Project R&D Engineer

Project updates

Rebuild Page

The page will automatically reload. You may need to reload again if the build takes longer than expected.

Useful links

Theme toggler

Select a theme and theme mode and click "Load theme" to load in your theme combination.

Theme:
Theme Mode: