Saturday, 19 November 2011

Data driven journalism

“Journalism has always been about reporting facts and assertions and making sense of world affairs. No news there. But as we move further into the 21st century, we will have to increasingly rely on “data” to feed our stories, to the point that “data-driven reporting” becomes second nature to journalists.”
Zach Beauvais

The above statement closely resonates with the work of American journalist and Freedom of Information activist,  Heather Brooke, who helped to expose UK parliamentary expenses. However, as a trainee journalist, the last thing that comes to my mind is data. Statistics are not my forte. I was terrible at maths in school and to be fair, I have managed to get through life knowing the basic - addition, subtraction, division and multiplication. So, imagine my face when, in digital journalism class, there was talk of mean, median and mode. Huh? I never thought I’d have to do averages again. Ever. 

But, my lecturer Andy Dickinson aka digidickinson put things into perspective when he said: “The mark of a good journalist is not knowing how to do everything, but, knowing whom to ask.” This statement alone was enough to get my attention.

In 3 hours, I learned that “data” can in fact provide interesting stories and is not as hard as it seems to analyse. The class began with looking at how to create forms and spreadsheets on Google documents and how these can be useful to obtain, arrange and visually display information or results. Andy taught us how to ‘scrape’ and ‘clean’ using html formulas.


This is a way to get information out of web pages and in to a spreadsheet. Google docs also have a clever way to ‘scrape’ websites as part of their spreadsheet tool. It involves a simple copy and paste using a, not so simple, formula and wallah the data magically appears, from your chosen web page and in to your spreadsheet. Thankfully for me, I wrote the instructions down.

As a class we copied the table of results of the best-selling albums, off wikipedia (for practice purpose only) and transferred this information into our Google spreadsheet. Once the data is in the spreadsheet, you are allowed to ‘clean’ it in terms of getting rid of any errors. Depending on what sort of data you are dealing with, the super helpful formulas (and this I did find impressive) will help you find the average, or in mathematical terms, the mean, median and mode of your selected cells within that one spreadsheet - you just have to know the right formula to tell Google spreadsheet what to do.

It really is something that, I see, can help journalism, also, the formulas can easily be found on the internet so you don’t actually have to do any maths! 

The “daddy of data scraping”

Funded by Channel 4, Scraperwiki is a website predominantly for computer programmers to help data requesters and, right now, is the current fave among journalists as the go to place for data related information. As we discussed in the lecture, the relationship between journalists and programmers is important. In many cases it helps to serve the public interest, especially if the data in question can reveal valuable and newsworthy material, just as the analysis of MPs expenses did.

Meet Junar was another website that Andy introduced us to. Not only can you collect, organise and use data but it also contains data that you can explore and which might be of use to you. It also has a social element by allowing the user to share their data on social networking platforms. Now, this is more my cup of tea. 

The Guardian's DATABLOG, edited by Simon Rogers, a pioneer in data driven journalism, is a great place for inspiration when it comes to how to report stories based on data and more importantly how to present the data in visual form. More recently, the DATABLOG, published a world map dotted with the occupy protests, which on it's own painted a powerful story. It goes to show that data can be used in so many different ways.


How do you present data in an engaging yet simple form? This part I didn’t mind. Bar charts, pie charts, line graphs and tables – they all work well with stories based on facts and figures. This sort of traditional visualisation of data goes hand in hand with stories about growth, decline, change, comparison and ranking. However, if you want to opt for a more simplistic yet effective way of telling a story then take a look at – a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. is free and easy to use                  Zoom in
Check out the word cloud above which I created on using a story, headlined, Hamid Karzai tells loya jirga: no US military pact until night raids cease, from the Guardian's website

The most prominent words in the news report are Afghan, Karzai, military, night, raids, partnership, sovereignty, national, Afghanistan, strategic and operations. The word cloud instantly portrays the angle and of this story and you now have a good idea of what's written in the full report. 

By the end of the class, I felt much more comfortable about having to analyse and organise data for journalistic use.

No comments:

Post a Comment