Main content

Datalab representing machine learning in the Βι¶ΉΤΌΕΔ - the experiment

Svetlana Videnova

Business Analyst, Βι¶ΉΤΌΕΔ+

One of the key objectives for the Βι¶ΉΤΌΕΔ for the coming years is to focus on younger audiences. Machine Learning recommendation capabilities can help us achieve that.

In a recent speech to Βι¶ΉΤΌΕΔ staff Tony Hall described our shared ambition “to grow our weekly online reach with younger audiences from 55% to 90% within four years”. We expect that proving personalised content via the Datalab platform to the new Βι¶ΉΤΌΕΔ+ app, including the learning throughout the experiment, will begin to contribute to this growth.  Product Manager James Metcalfe describes the approach and objectives of Βι¶ΉΤΌΕΔ+ in .

The aim of , part of Βι¶ΉΤΌΕΔ Design + Engineering, is to help achieve this vision and reach younger audiences, via a new way of working and by experimenting with new .

Datalab is the first Βι¶ΉΤΌΕΔ platform working with Google Cloud Platform in production. Being the pioneer of this integration is meant to refine our approach to information security and the data privacy approval process, as well as establishing a new infrastructure.

As if this wasn’t challenging enough, we brought more excitement to our engineering team by including integration, Elasticsearch, Kubernetes and Spinnaker for container-based deployments and Drone as part of our stack. As a team the decision was made to implement these technologies, regardless of the fact that not everyone was experienced with them. As a result, we have adopted some more than others, but again, one of the key points is that we learn along the way, and gain skills that will be valuable later in the programme.

As a platform, we had to connect to the existing Βι¶ΉΤΌΕΔ data stores, including our User Activity Store (UAS) database, as well as serving the media content from a different AWS database.

This infrastructure allowed us to provide the groundwork for data scientists to start to explore various methods to satisfy the needs of Βι¶ΉΤΌΕΔ products and master the personalised experience for Βι¶ΉΤΌΕΔ users, starting with Βι¶ΉΤΌΕΔ+ app.

For example, new users, who have no previous history with the Βι¶ΉΤΌΕΔ, are given a cold start recommendation, so that they can begin their Βι¶ΉΤΌΕΔ+ journey. The first focus for the Datalab team was to create these recommendations to serve relevant content. It progressively incorporates a user’s history, refines their content and helps them discover interesting Βι¶ΉΤΌΕΔ suggestions.

Finding out what data we had to work with was crucial. We quickly discovered that the metadata for some Βι¶ΉΤΌΕΔ content is inconsistent. This lead us to conversations with our editorial colleagues on ways of tagging and creating content metadatain a more uniform manner, so that we can surface their output to the audience in a more personalised way.

The first content type to be ingested is video clips. In the future the aim is to include audio and articles. Currently in our Elasticsearch DB we have 1,137,598 clips. A set of filters was applied to provide only relevant clips, with complete metadata and editorial risks mitigated:

  • Unique clips
  • Only English content (for now)
  • Filter out audio and weather (for now)
  • Filter out clips older than 2013 (for now)
  • Editorial risk filtering
  • 128 Βι¶ΉΤΌΕΔ brands are not surfaced in the Βι¶ΉΤΌΕΔ+ app
  • 8 master brand were filtered out (mainly to help serve only English content for now)

That leaves us to 131,626 clips available for Βι¶ΉΤΌΕΔ+ users.

The following techniques are being used by our data science team, to experiment, score and create “better” recommendations:

  • Model-based collaborative filtering: we’re using embeddings for our content using word2vecmodels, making the content -“words” and our playlists -“sentences”
  • Offline scoring: to measure how a particular recommender system is performing, we’re using different metrics, like recency, popularity, Normalized Discounted Cumulative Gain (nDCG) and hit-rate. This helps us to select which versions of our models should go for online scoring
  • Online scoring: we have a system in place to score the recommenders with the live user data, using A/Bor hit-ratetesting, whenever necessary
  • Combining ML with editorial guidelines: we can prioritise one genre or brand against the others, to fit the editorial needs and better match our audience expectation
  • To help share what we know , to establish a good understanding of what we are building

We came out of this iterative process full of valuable insights, and building a new ML platform was just one of the things we discovered. Datalab established a positive team culture, with a great deal of multi-disciplinary learning. Despite the bumpy road, we now have a clear vision of our engineering and data science responsibilities, including A/B testing our process, trying various agile methodologies, and bringing teams in Salford and London together for closer collaboration.

We are excited about the next steps, and there is a lot to do on all aspects of our platform: infrastructure, engineering, devops, data science and ML. We will be inviting a new Βι¶ΉΤΌΕΔ team into the Datalab world: beginning with an exploratory session with the Voice Team over Christmas. We will also be proposing collaborations with other product groups, including R&D and News, early in 2019 to continue to experiment, push the boundaries of our exploration further and innovate ML in the Βι¶ΉΤΌΕΔ.

If you would like to know more about Datalab, our journey, or you would like to share your own experience, feel free to contact us at datalab@bbc.co.uk.

And in other important news: we are still recruiting!

Βι¶ΉΤΌΕΔ+ is available on Android:

and iOS: