Felisia Loukou – Government Digital Service

Data Scientist

Felisia is a data scientist with an academic background in Computer Science and Natural Language Processing. For the past year, she has worked in Government Digital Service, helping develop the first application of deep learning to GOV.UK for content topic classification. She has further led in incorporating network science to the analysis of user activity on the site, as well as set up the necessary data extraction and aggregation infrastructure.

Her current team is now improving navigation and findability on GOV.UK, through developing content recommendation algorithms. Her portion of the work is focused on network representation learning.

Using algorithms to generate related links can improve on human-curated links. More importantly it can scale and automatically be applied to all of the 237k content items on GOV.UK, improving navigation for users. This led to our content designers joking “I for one welcome our robot overlords!”

Users need help to find the content they are looking for. GOV.UK is vast and has over 230k unique pieces of content. Human publishers can hand-curate navigational aid for users by creating related links on a page. This points users towards similar content that they might find useful, based on expert domain knowledge of the publisher. This is true of approximately 2k pages on GOV.UK. To provide related links for all 237k pages would be too onerous. Instead we used a variety of machine learning algorithms to generate these links trained on the text content of the pages themselves and on historic user journey data from BigQuery. We used a journey-level A/B testing framework to assess the impact of these pages on a variety of metrics assessing how the related links were used. All the algorithms improved the related link use on non-mainstream (“unloved”) journeys. Only the node2vec and LLR algorithms, trained on historic user journeys, improved related link use on mainstream (“loved”) journeys. More importantly this approach can scale and automatically be applied to all of GOV.UK. This algorithmic approach was deployed using standard DevOps approaches with the compute handled on an Amazon Web Service (AWS) instance. Generated links were moved to a S3 bucket before being served for production on the site. This facilitated the updating of the links periodically – to adjust links as content becomes stale or as user needs change with the season. This deployment has the potential to improve the journeys of circa. 30k users per day in the near-term and perhaps more in the future as users learn to trust related links more. Their use could be extended to improve content at publication by suggesting related links to publishers.

Actionable Takeaway #1
How to make better use of user journey data from BigQuery and derive proxy metrics to test hypotheses for A/B testing. 

Actionable Takeaway #2
How to generate recommended links for your content based on content similarity and user journey network data fed into different algorithms. 

Actionable Takeaway #3
How to deploy your model and update the related links on your site periodically.