Member-only story

When you get data in sentences: how to use a spreadsheet to extract numbers from phrases

Paul Bradshaw
5 min readAug 15, 2019
This BBC story involved converting phrases into numbers that could be used in calculations

This post was originally published on the Online Journalism Blog. You can subscribe for email updates on the site.

Earlier this month the BBC Data Unit published a story on unduly lenient sentences which involved working with data that was trapped in phrases.

We needed to be able to take a collection of words such as “11 years and 5 months’ imprisonment” and convert that into something that could be used in spreadsheet calculations (specifically, comparing the lengths of time represented by two different phrases).

It’s a problem you come across every so often as a journalist — especially with FOI requests — so in this post — taken from the book Finding Stories in Spreadsheets — I’ll explain how to do that.

First, here’s what the data looks like:

Break down the steps

The first thing to do in any situation like this is break down the task into its constituent problems/challenges. Well, we need to:

  • Identify where the number of years is stated
  • Extract that number of years
  • Identify where the number of months is stated
  • Extract that number of months

--

--

Paul Bradshaw
Paul Bradshaw

Written by Paul Bradshaw

Data journalist and course leader of the MA in Data Journalism at Birmingham City University. Author of the Online Journalism Handbook.

No responses yet