Data + Soul = Story: April 2015

Thursday, April 30, 2015

Critical analysis of a data-driven news story

For this post I will be analyzing a piece of data-driven science journalism by Alister Doyle for Reuters with the headline "China to surpass U.S. as top cause of modern global warming".

The top line for this story is that if you add up all the CO2 emissions for each country from 1990 until now, China has caught up to the US in cumulative emissions and is projected to exceed it by the end of this year or sometime next year. Or as Mr Doyle put it in his article:

"China is poised to overtake the United States as the main cause of man-made global warming since 1990, the benchmark year for U.N.-led action, in a historic shift that may raise pressure on Beijing to act.

China's cumulative greenhouse gas emissions since 1990, when governments were becoming aware of climate change, will outstrip those of the United States in 2015 or 2016, according to separate estimates by experts in Norway and the United States."

I think that both my summation and Mr Doyle's descriptions are a bit wordy and could be aided by the use of a chart or visualization. Mr Doyle uses carbon dioxide emissions data from two independent sources for this story; The Center for International Climate and Environmental Research, Oslo (CICERO) in Norway, and the World Resources Institute (WRI), a US-based think-tank. The main point of the story is supported by just two numbers referenced, 151 and 147 billion tonnes, which correspond to the cumulative CO2 emissions between 1990-2016 for China and the US respectively. I think that using a bar chart (similar to the one I made in a previous post on CO2 emissions) would be a really nice way to break up all the text in the original article and provide some sense of scale for the reader.

I played around a bit with CO2 emissions data I got from The World Bank, since there is no link in the article to the data mentioned, and came up with a pretty simple and straightforward visualization that I think would strengthen the article without distracting from the story told in the text.

The article does an excellent job of contextualizing the carbon emissions data with regards to the consequences of increased CO2 levels and international efforts to control them. It also has a good diversity of relevant and useful quotes from experts on the topic.

"A few years ago China's per capita emissions were low, its historical responsibility was low. That's changing fast," said Glen Peters of CICERO.

The rise of cumulative emissions "obviously does open China up to claims of responsibility from other developing countries," said Daniel Farber, a professor of law at the University of California, Berkeley.

"All countries now have responsibility. It's not just a story about China -- it's a story about the whole world," said Ottmar Edenhofer of the Potsdam Institute for Climate Impact Research and co-chair of a U.N. climate report last year.

"China is acting. It has acknowledged its position as a key polluter," said Saleemel Huq, of the International Institute for Environment and Development in London.

Any fair formula for sharing out that trillion tonnes, or roughly 30 years of emissions at current rates, inevitably has to consider what each country has done in the past, said Myles Allen, a scientist at Oxford University.

My only criticisms are the lack of a chart or some kind of visualization to really emphasize the numbers and how they relate, and that the data used for the story is not provided, nor linked to anywhere in the article.

Thursday, April 16, 2015

Two of the most useful spreadsheet tricks ever

Before I get going on making some sweet looking data visualizations, there are a couple of data-cleaning steps which make a world of difference. Both can be done relatively quickly (and painlessly) using OpenRefine (also known as Google Refine)"a powerful tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases."

There are so many useful options for working with data, what OpenRefine calls "facets" and "filters". Also, a simple interface tracks every step you make, allowing easy undo's/redo's anytime along the way. I'm going to go over two functions which I find incredibly useful in preparing data for later visualization: lengthening a wide dataset, and merging two datasets with a common column.

First, giving your data a growth spurt by trimming the fat. What the heck do I mean by that?

Well, lots of datasets which are downloadable from the internet come in a wide format. For example, World Bank data has its datasets sorted by year across columns. I'm going to use the data for carbon dioxide emissions to illustrate this function, and you can download the data here to follow along as well.

Create a project and upload the .csv file, leaving the default settings as is. You should get something that looks like this:

As you can see, individual years run across the columns making the dataset wide. What we want is to get all the yearly data into two columns, one with the year and the other with CO2 emissions. Here comes the easy part. Left click on the downward arrow next to the column heading 1960, hover over Transpose and select Transpose cells across columns into rows.

Fill in the dialogue box like below (from column 1960 to the last column it will create 2 new columns). There are some missing data values for certain countries so we don't want to Ignore blank cells (uncheck) but it's important to Fill down in other columns (check).

Now we have our data nice and tight width-wise, and 13640 rows long.

Second, adding data from another source can help add valuable details and more categorizations/filters for your original data. You can download the second dataset here. This has information about the region and income group for every country, which we're going to merge with our recently lengthened CO2 emissions dataset.

Like before, create a new project and import the nations.csv file, keeping the default import settings. Return back to the CO2 emissions browser tab. Left click on the downward arrow next to country, hover over Edit columns and select Add column based on this column. (note: you could also do this using the iso_a3 column since it is also featured in the nations.csv dataset)

Now we're going to use the GREL (Google Refine Expression Language) command: cell.cross("string projectName", "string commonColumn").cells["string columnName"].value[0]

Keep the quotation marks, but replace string projectName with nations csv, string commonColumn with country, and string columnName with region.

This will add a new column to the CO2 emissions dataset with the appropriate regions for the corresponding countries. Repeat the GREL cell.cross command to add a column containing income_group.

Voila! This dataset is now ready for a nice visualization makeover. I used it to make this treemap bar chart using tableau.

For other OpenRefine functions see https://github.com/OpenRefine/OpenRefine/wiki/GREL-Functions

Wednesday, April 15, 2015

Oklahoma - where the earth quakes, fracking down the plains

I made this visualization using data from The USGS after reading an article in The Guardian about how it's likely that the spike in earthquakes in Oklahoma and nearby states are man-made and caused by the fracking process of injecting wastewater into deep-underground disposal wells.

I was inspired to make a map which could show the epicentre and magnitude of earthquakes over a period of time, and found a great example of this with a CartoDB Torque map.

I really like this visualization because I think it gives a good historical context of earthquakes in this area going back to 1975, and then shows the explosion that begins in 2009. The map may proceed through time a bit quickly, which is a fairly simple element to change using CartoDB's interface. I think an accompanying static bar chart with the number and magnitudes of earthquakes over time in Oklahoma and the surrounding states would help supplement the dynamic nature of the map.

On a side note, I also found the following chart made by Steve Maier using Plotly which tells another very important stories about the history of earthquakes in Oklahoma.

<b>Oklahoma Earthquakes</b><br>1990-1999: 788 quakes (green)<br>2004-2013: 6,569 quakes (red)<br>

Monday, April 13, 2015

Another interactive chart courtesy of my new best friend...tableau

Data from The World Bank

The inspiration for this visualization was one I saw on the website Gapminder charting the Wealth & Health of Nations. The graph shows how long people live (average life expectancy) and how much money they earn (GDP per capita) for each country since 1800 to 2013. In my opinion, this interactive graphic is a bit overwhelming for inclusion in an article. However, it is an excellent example for taking bits and pieces and using for your own visualizations.

For example, in my tableau version where I replaced GDP per capita with health expenditure as a per cent of GDP, I kept several elements from the Gapminder chart:

1) I used a bubble chart with the area of the bubble corresponding to the population size of the country (note: Gapminder allows you to change this feature for a whole range of indicators).
2) I included a slider which changes the chart according to the specific year (note: Gapminder allows manual control over the time slider or you can hit Play and it will automatically proceed through time).
3) I included a side bar for selecting particular countries by their name, region, and/or income group. While Gapminder went with a list of all the countries to select, I used a search box for mine.
4) A map colour-coded by region is also on display and changes as specific countries are selected. This is one feature in which I think I improved over Gapminder's. My graphic highlights specific countries when selected, whereas the Gapminder one only highlights the bigger region.
5) When hovering over the bubble chart, the data for the specific country is displayed. This is another aspect I think I improved over the Gapminder example. My chart shows this data next to the bubble hovered over. The Gapminder version just highlights the data values on the x- and y-axes.

Saturday, April 11, 2015

Tableau = awesome; CO2 emissions = not-so-awesome

What I really like about this visualization is that it is helping to tell two stories about the same thing, CO2 emissions.

The treemap bar chart on the left clearly shows that overall CO2 emissions are rising annually, and that China/SE Asia is the region contributing the most amount of said emissions.

The line chart on the right tells a different story. It shows that even though China emits the most CO2, per capita it is actually quite low. Significantly lower than the US.

What is alarming about this is taking the total emissions story and pairing it with the per capita story. China is trending upwards in both instances which has major global consequences for climate change.

Data + Soul = Story

Translate