Dr Matthew Gregory – Government Digital Service

Data Scientist, Innovation Team

Mat is a data scientist with an academic background in Genetic Engineering and statistics. He has developed expertise in machine learning methodologies and their application to user journey and navigation problems around websites. He’s a former science teacher and has worked across data science teams in the public sector, including the Department for Education and the Government Digital Service.

Matt was one of the first data scientists recruited to the UK Civil Service, and his work to modernise the creation of statistical publications had a major impact across Whitehall departments. Recently he’s worked in GOV.UK, developing the A/B testing framework to analyse Big Query user journey data. His team are improving navigation on the site through developing algorithms to recommend content to users.

Summary
Using algorithms to generate related links can improve on human-curated links. More importantly it can scale and automatically be applied to all of the 237k content items on GOV.UK, improving navigation for users. This led to our content designers joking “I for one welcome our robot overlords!”

Abstract
Users need help to find the content they are looking for. GOV.UK is vast and has over 230k unique pieces of content. Human publishers can hand-curate navigational aid for users by creating related links on a page. This points users towards similar content that they might find useful, based on expert domain knowledge of the publisher. This is true of approximately 2k pages on GOV.UK. To provide related links for all 237k pages would be too onerous. Instead we used a variety of machine learning algorithms to generate these links trained on the text content of the pages themselves and on historic user journey data from BigQuery. We used a journey-level A/B testing framework to assess the impact of these pages on a variety of metrics assessing how the related links were used. All the algorithms improved the related link use on non-mainstream (“unloved”) journeys. Only the node2vec and LLR algorithms, trained on historic user journeys, improved related link use on mainstream (“loved”) journeys. More importantly this approach can scale and automatically be applied to all of GOV.UK. This algorithmic approach was deployed using standard DevOps approaches with the compute handled on an Amazon Web Service (AWS) instance. Generated links were moved to a S3 bucket before being served for production on the site. This facilitated the updating of the links periodically – to adjust links as content becomes stale or as user needs change with the season. This deployment has the potential to improve the journeys of circa. 30k users per day in the near-term and perhaps more in the future as users learn to trust related links more. Their use could be extended to improve content at publication by suggesting related links to publishers.

Actionable Takeaway #1
How to make better use of user journey data from BigQuery and derive proxy metrics to test hypotheses for A/B testing. [Why you should give your tech people time and space to develop their data pipelines. How to evaluate whether your algorithms are any good. Why you need a multi-disciplinary team.]

Actionable Takeaway #2
How to generate recommended links for your content based on content similarity and user journey network data fed into different algorithms. []

Actionable Takeaway #3
How to deploy your model and update the related links on your site periodically.