Crowdsourcing World Service programme descriptions

Posted by Gareth Adams on 23 Sep 2013, last updated 2 Nov 2015

We're opening up wiki-style editing on the prototype, in an effort to understand how accurate community-sourced metadata can be, compared to manual curation.

The World Service Archive prototype sits neatly in the middle of a Venn diagram of things the �鶹Լ�� doesn't traditionally do: opening up historical �鶹Լ�� content, using machine-learning techniques, and collecting metadata from members of the public. This unique space, already being a little bit different, makes it a nice place for us to be more experimental with new interfaces and approaches.

The Archive (as the name suggests) is made up of data from the last 60 years of the World Service. It contains basic details about each programme – genre, title, a brief summary and the transmission date – but for us to be able to develop useful ways of navigating an archive of 70 thousand episodes, we need to know more. Simply having two programmes with descriptions that both contain the word "Queen" isn't enough to know that they're both about Freddie Mercury's band.

Up until now, we've just been asking users to tell us what is in a programme by adding tags representing the people, topics and concepts that are involved. Every tag added has to correspond to a page on Wikipedia, and this knowledge is invaluable – we get unambiguous conceptual links between programmes about the same topics.

Users have been great at that task, adding thousands of tags, but we found they also started telling us about other errors in the data, for example a spelling mistake in a programme title, an incorrect credit for the writer of a radio play, or a transmission date that can't possibly be right because it's before the event that the programme is discussing. Most of these are probably transcription errors from when the archive was digitised, and it's very useful to know about them and get them corrected.

In response, we're giving our users the ability to submit new titles and synopses for world service programmes. These changes feed back immediately into the site and our search tool, and the full edit history is available for all to see.

Now, while making your website editable seems like an easy way to improve its quality and get more people involved, it's not as simple as adding a "Save" button and hoping for the best. At the �鶹Լ�� we're lucky to have volunteers who are motivated just by the idea of improving our collective knowledge (which turns out to be more tangible motivations), but it's also vital for us to consider the technical and editorial implications of this approach.

Security

It's not so much that web security is hard, but because there are so many links in the chain between a user and the content they're accessing, there are a lot of places that things can go wrong.

One of the more common problems with user-generated content is known as (XSS), where instead of just adding text to a site, a malicious user is able to add code (usually Javascript). This code then gets run on a victim's computer when they view the compromised web page. An attack like this can't (by itself) access data on your computer or websites other than the one compromised, but it could potentially learn your username and password for that site, or carry out actions as if you had made them yourself, so it's still a serious problem.

Recently there are more and more tools to make it easier to build sites which are resilient to this attack. The web framework that the prototype is built on (Ruby on Rails) comes with this protection built-in, so we can be confident that we're secured against XSS.

Accuracy

In my opinion the more interesting discussion comes around accuracy. As a trusted organisation, at the �鶹Լ�� we have a responsibility to provide accurate information to our audience, and there are a lot of editorial processes geared around stopping us producing incorrect material. But in this instance our source data is already incorrect in a lot of places.

As the world's biggest collaborative website, Wikipedia's experience in this field shouldn't be ignored. Their assumption that open collaboration leads to higher quality output , but it sits alongside another core guideline that "". That's a tricky statement to make for an editorial organisation that .

But the World Service Archive is an experiment in archive navigation metadata, the immediate aim isn't necessarily to answer all of these questions, just to make sure we have enough data to have an informed discussion. And the only way to do that is to see what contributions people are willing to make.

�鶹Լ��

Accessibility links

Research & Development