麻豆约拍 Blogs - Technology + Creativity at the 麻豆约拍 - Datalab representing machine learning in the 麻豆约拍

One of the key objectives for the 麻豆约拍 for the coming years is to focus on younger audiences. Machine Learning recommendation capabilities can help us achieve that.

In a recent speech to 麻豆约拍 staff Tony Hall described our shared ambition “to grow our weekly online reach with younger audiences from 55% to 90% within four years”. We expect that proving personalised content via the Datalab platform to the new 麻豆约拍+ app, including the learning throughout the experiment, will begin to contribute to this growth. Product Manager James Metcalfe describes the approach and objectives of 麻豆约拍+ in .

The aim of , part of 麻豆约拍 Design + Engineering, is to help achieve this vision and reach younger audiences, via a new way of working and by experimenting with new .

Datalab is the first 麻豆约拍 platform working with Google Cloud Platform in production. Being the pioneer of this integration is meant to refine our approach to information security and the data privacy approval process, as well as establishing a new infrastructure.

As if this wasn’t challenging enough, we brought more excitement to our engineering team by including integration, Elasticsearch, Kubernetes and Spinnaker for container-based deployments and Drone as part of our stack. As a team the decision was made to implement these technologies, regardless of the fact that not everyone was experienced with them. As a result, we have adopted some more than others, but again, one of the key points is that we learn along the way, and gain skills that will be valuable later in the programme.

As a platform, we had to connect to the existing 麻豆约拍 data stores, including our User Activity Store (UAS) database, as well as serving the media content from a different AWS database.

This infrastructure allowed us to provide the groundwork for data scientists to start to explore various methods to satisfy the needs of 麻豆约拍 products and master the personalised experience for 麻豆约拍 users, starting with 麻豆约拍+ app.

For example, new users, who have no previous history with the 麻豆约拍, are given a cold start recommendation, so that they can begin their 麻豆约拍+ journey. The first focus for the Datalab team was to create these recommendations to serve relevant content. It progressively incorporates a user’s history, refines their content and helps them discover interesting 麻豆约拍 suggestions.

Finding out what data we had to work with was crucial. We quickly discovered that the metadata for some 麻豆约拍 content is inconsistent. This lead us to conversations with our editorial colleagues on ways of tagging and creating content metadatain a more uniform manner, so that we can surface their output to the audience in a more personalised way.

The first content type to be ingested is video clips. In the future the aim is to include audio and articles. Currently in our Elasticsearch DB we have 1,137,598 clips. A set of filters was applied to provide only relevant clips, with complete metadata and editorial risks mitigated:

Unique clips
Only English content (for now)
Filter out audio and weather (for now)
Filter out clips older than 2013 (for now)
Editorial risk filtering
128 麻豆约拍 brands are not surfaced in the 麻豆约拍+ app
8 master brand were filtered out (mainly to help serve only English content for now)

That leaves us to 131,626 clips available for 麻豆约拍+ users.

The following techniques are being used by our data science team, to experiment, score and create “better” recommendations:

Model-based collaborative filtering: we’re using embeddings for our content using word2vecmodels, making the content -“words” and our playlists -“sentences”
Offline scoring: to measure how a particular recommender system is performing, we’re using different metrics, like recency, popularity, Normalized Discounted Cumulative Gain (nDCG) and hit-rate. This helps us to select which versions of our models should go for online scoring
Online scoring: we have a system in place to score the recommenders with the live user data, using A/Bor hit-ratetesting, whenever necessary
Combining ML with editorial guidelines: we can prioritise one genre or brand against the others, to fit the editorial needs and better match our audience expectation
To help share what we know , to establish a good understanding of what we are building

We came out of this iterative process full of valuable insights, and building a new ML platform was just one of the things we discovered. Datalab established a positive team culture, with a great deal of multi-disciplinary learning. Despite the bumpy road, we now have a clear vision of our engineering and data science responsibilities, including A/B testing our process, trying various agile methodologies, and bringing teams in Salford and London together for closer collaboration.

We are excited about the next steps, and there is a lot to do on all aspects of our platform: infrastructure, engineering, devops, data science and ML. We will be inviting a new 麻豆约拍 team into the Datalab world: beginning with an exploratory session with the Voice Team over Christmas. We will also be proposing collaborations with other product groups, including R&D and News, early in 2019 to continue to experiment, push the boundaries of our exploration further and innovate ML in the 麻豆约拍.

If you would like to know more about Datalab, our journey, or you would like to share your own experience, feel free to contact us at datalab@bbc.co.uk.

And in other important news: we are still recruiting!

麻豆约拍+ is available on Android:

and iOS:

麻豆约拍

Accessibility links

Datalab representing machine learning in the 麻豆约拍 - the experiment

More Posts

Previous

Building the 麻豆约拍 Sounds mobile app

Next

麻豆约拍+ relaunch - An experimental app for a more personal 麻豆约拍