Member-only story
When you get data in sentences: how to use a spreadsheet to extract numbers from phrases
This post was originally published on the Online Journalism Blog. You can subscribe for email updates on the site.
Earlier this month the BBC Data Unit published a story on unduly lenient sentences which involved working with data that was trapped in phrases.
We needed to be able to take a collection of words such as “11 years and 5 months’ imprisonment” and convert that into something that could be used in spreadsheet calculations (specifically, comparing the lengths of time represented by two different phrases).
It’s a problem you come across every so often as a journalist — especially with FOI requests — so in this post — taken from the book Finding Stories in Spreadsheets — I’ll explain how to do that.
First, here’s what the data looks like:
Break down the steps
The first thing to do in any situation like this is break down the task into its constituent problems/challenges. Well, we need to:
- Identify where the number of years is stated
- Extract that number of years
- Identify where the number of months is stated
- Extract that number of months