Translate

Thursday, May 7, 2015

♫ These are a few of my favourite data-things... ♫

This is the era of data, and journalism is evolving with the times. There are so many tools and resources available for data journalists (especially science data journalists) with more being added every other day it seems. It can get a bit overwhelming (and downright impossible) to keep up with all the latest developments, but I've put together a list of some of my favourite sources to help get you started.

Sources: 

World Health Organization (WHO) - Global Health Observatory Data Repository
Provides data (for viewing &/or downloading) pertaining to health-related topics such as Health systems, Infectious diseases, and Public health and environment.

World Bank - Open Data
Free and open access to data about development in countries around the world.

NASA - Data Portal
Growing catalog of publicly available datasets relating to both Space and Earth Science.

ClinicalTrials.gov - Registry & Results Database
Database of publicly and privately supported clinical studies of human participants conducted around the world. A service of the US National Institutes of Health (NIH).

Data.gov - US Government's Open Data
Over 130,000 datasets on topics such as Agriculture, Education, Public Safety, and Science & Research.
(see also: Data.gov.uk - UK Transparency and Open Data team)

Scrapers:

kimono - A wonderful web browser plugin that allows you to easily (no coding required) turn websites into APIs to extract only the data you want. It's infinite scroll and pagination functions are extremely useful for websites where the contents of the page expand as you scroll to the bottom or continue on more pages. A chat window that connects you with a helping hand and the kimono blog are also excellent features for problem-solving any issues and being involved in the wider kimmunity.

import.io - Another great online tool to turn web pages into data with no coding required. This also comes with a blog showcasing how the community is using import.io. A new feature allows you to send your extracted data straight to Plotly for streamlining the visualization of your just-scraped data.

Cleaner:

Google/Open Refine - My favourite tool for cleaning and transforming messy data (see previous blogpost). The best feature is that every action or operation on the data is recorded and stored in the order performed. This allows mistakes to be corrected with a simple undo, and the ability to copy the sequence of operations and quickly repeat the process for another (similar?) dataset.

*I've shared tools for scraping and cleaning that don't require coding, but can be modified and optimized with some coding knowledge. A great (free) online resource for learning pretty much everything online-coding related is W3Schools.

Visualizations:

tableau public - By far my favourite tool for building interactive charts. It also has a feature called dashboard which allows for combining multiple charts and/or maps to build more complex visualizations that can accentuate a particular point or angle and help weave together a narrative (see my CO2 emissions example, health spending and life expectancy example, and my other CO2 emissions example).

cartoDB - A mapmaking tool, for anything from the more localized city level, to countries on the global scale. Torque is a new-ish feature which allows for the map to change over time in an automatic and dynamic way (see my earthquake example). CartoDB uses CSS, which is a fairly straightforward computer language, but the interface is designed so as not to require any coding. Just in case you do want to modify your maps in a way requiring CSS code, or just to get an idea of the basics and special tricks for making interactive maps using CartoDB, they offer free webinars.

plotly - An easy-to-use and very useful tool for graphing data and finding the best chart-type to maximize the soul of the data (see earthquake depth example). The Plotly Blog is also a great resource for tips on choosing the right type of chart, seeing what other people have created, and maybe even showcasing a bit of your own work.

Datawrapper - Chart/map-making tool with the tagline: "create charts and maps in just four steps." Like Plotly it's very easy to use, and has a simple interface for customizing your visualization. A chart gallery shows the more than 100,000 charts that have been created using Datawrapper.

Websites:

The Upshot - online news and data visualization site for the New York Times.

FiveThirtyEight - Started by Nate Silver as a politics data blog for the New York Times until bought by ESPN. In addition to covering politics, FiveThirtyEight also touches on economics, sports, and SCIENCE!

theguardian datablog - data journalism courtesy of The Guardian.

Science data journalists:

Peter Aldhous - Is currently a science and health reporter for BuzzFeedNews and has previously worked at Nature and The New Scientist.

David Herzog - A veteran investigative reporter and data journalist, and the academic adviser to the
National Institute for Computer-Assisted Reporting.

Christie Aschwanden - Lead science writer for FiveThirtyEight.com and health columnist for The Washington Post.


This list is by no means perfect or exhaustive. There are so many different sources of data, a constant progression of the tools to acquire, clean, and visualize data, and so many websites/blogs/journalists/data-nerds with their own unique skills and perspectives. The idea is not to tell you what you should or shouldn't do, but to give you a grounding for what's out there, what I personally use, and to help you find your unique voice and style, and impart a bit of your soul into the data.

No comments:

Post a Comment