Essential Tools for Data Journalism: A 2023 Guide

Data journalism tools

Data journalism has evolved from a specialized niche to a fundamental component of modern reporting. The ability to find, analyze, and visualize data effectively has become an essential skill for journalists across beats. This comprehensive guide explores the latest tools, resources, and techniques that are transforming how reporters work with data to create impactful stories.

The Data Journalism Workflow

To understand the tools landscape, it's helpful to organize them according to the typical data journalism workflow:

  1. Finding and acquiring data
  2. Cleaning and processing data
  3. Analyzing and exploring data
  4. Visualizing data
  5. Publishing and presenting data stories

Each phase requires different capabilities, and while some tools specialize in one area, others span multiple parts of the workflow. We'll organize this guide accordingly.

1. Finding and Acquiring Data

The starting point for any data story is locating relevant, reliable data sources. Modern tools have significantly expanded journalists' ability to discover and access data.

Data Portals and Repositories

  • Google Dataset Search - A specialized search engine for locating datasets across domains. Its user-friendly interface makes it an excellent starting point for data discovery.
  • Data.gov - The U.S. government's open data portal containing datasets from federal, state, and local agencies.
  • Eurostat - The European Union's statistical office, providing high-quality comparative data across European countries.
  • World Bank Open Data - Global development data covering population demographics, economics, and more.
  • Our World in Data - Research and data on global challenges, with excellent contextual information.

Web Scraping Tools

When data isn't available in a structured format, web scraping tools can extract information from websites:

  • Parsehub - A user-friendly application that can extract data from complex websites without requiring programming knowledge.
  • Octoparse - Handles JavaScript-heavy sites and offers both cloud and local scraping options.
  • Data Miner - A browser extension that extracts data from web pages and exports to various formats.
  • Beautiful Soup - A Python library for web scraping that offers more flexibility for developers.
Person searching for data on multiple screens
Data journalists often need to search through multiple sources to find relevant datasets

Freedom of Information Platforms

Tools to streamline the process of requesting public information:

  • MuckRock - A platform for filing, tracking, and sharing public records requests in the U.S.
  • WhatDoTheyKnow - A UK-focused service for making Freedom of Information requests.
  • Alaveteli - Open-source software for building FOI request platforms, used in multiple countries.

"The data discovery phase sets the foundation for your entire story. Invest time in finding the most comprehensive and reliable sources rather than settling for the first dataset you encounter."

— Sarah Cohen, Pulitzer Prize-winning data journalist

2. Cleaning and Processing Data

Raw data rarely comes in a ready-to-analyze format. Data cleaning—identifying and correcting errors, handling missing values, and restructuring data—often consumes the most time in the data journalism workflow.

General-Purpose Data Cleaning Tools

  • OpenRefine - A powerful, free tool for working with messy data. Particularly useful for cleaning inconsistent formatting, splitting columns, and removing duplicates.
  • Trifacta Wrangler - Offers guided data transformation with an intuitive interface and automatic suggestions for cleaning operations.
  • Excel Power Query - Built into Microsoft Excel, allows for reproducible data cleaning operations and combining multiple data sources.
  • Google Sheets - Free and accessible with surprisingly powerful data cleaning capabilities through functions and add-ons.

Specialized Cleaning Tools

  • CSVKit - Command-line tools for working with CSV files, offering powerful options for examining, filtering, and transforming data.
  • Tabula - Extracts tables from PDF files, a common challenge in data journalism.
  • Workbench - A journalism-focused platform that combines data cleaning with analysis and visualization in a collaborative environment.

Programming Libraries for Data Cleaning

For journalists with coding experience, these libraries offer powerful cleaning capabilities:

  • Pandas - A Python library that has become the standard for data manipulation and cleaning.
  • tidyr/dplyr - R packages for data cleaning and transformation with an intuitive syntax.
Data cleaning process on computer screen
Data cleaning is often the most time-consuming part of the data journalism process

3. Analyzing and Exploring Data

Once your data is clean, the next step is analysis—finding patterns, outliers, and insights that will form the basis of your story.

Spreadsheet Applications

  • Microsoft Excel - The industry standard with robust functionality for sorting, filtering, pivot tables, and basic statistical analysis.
  • Google Sheets - Free, cloud-based alternative with strong collaboration features and improving analytical capabilities.
  • Airtable - Combines spreadsheet functionality with database features, useful for complex data relationships.

Statistical Analysis Tools

  • JASP - Free, user-friendly statistical software designed for researchers without extensive statistical backgrounds.
  • SPSS - Professional statistical software with a graphical interface, commonly used in social sciences.
  • Stata - Powerful software for data analysis, particularly strong for econometrics and panel data.

Data Analysis Programming Languages

  • Python with Pandas, NumPy, and Statsmodels - Powerful ecosystem for data analysis with extensive documentation and community support.
  • R with tidyverse - Statistical programming language specifically designed for data analysis and visualization.
  • SQL - Essential for working with database data, allowing specific queries and aggregations.

No-Code/Low-Code Analysis Platforms

  • Tableau Prep - Visual data preparation and analysis tool with strong integration into Tableau's visualization platform.
  • KNIME - Open-source platform for creating data workflows through a visual programming interface.
  • Workbench (formerly DataWrapper) - Journalism-focused platform with analysis capabilities designed for storytelling.

4. Visualizing Data

Effective data visualization transforms numbers into compelling visual narratives that engage readers and communicate insights clearly.

Interactive Visualization Tools

  • Datawrapper - Designed specifically for newsrooms, creating responsive charts, maps, and tables with minimal technical expertise required.
  • Flourish - User-friendly platform for creating interactive visualizations, with templates designed for storytelling.
  • Tableau Public - Free version of the powerful Tableau visualization software, with extensive capabilities for interactive dashboards.
  • Power BI - Microsoft's business intelligence tool, increasingly used in data journalism for complex visualizations.
Data visualization dashboard creation
Modern visualization tools allow journalists to create interactive graphics without coding

Mapping Tools

  • QGIS - Free, open-source Geographic Information System with extensive mapping capabilities.
  • MapBox - Platform for custom, interactive maps with strong design control.
  • Kepler.gl - Open-source geospatial analysis tool for large datasets, creating beautiful map visualizations.
  • CARTO - Location intelligence platform with strong analysis and visualization features.

Specialized Visualization Libraries

For custom visualization needs, these programming libraries offer extensive control:

  • D3.js - JavaScript library for creating custom, interactive data visualizations on the web.
  • Plotly - Creates interactive graphs in Python, R, or JavaScript with minimal code.
  • ggplot2 - R package based on the Grammar of Graphics principles, creating publication-quality visualizations.
  • Matplotlib/Seaborn - Python visualization libraries with extensive functionality.

Chart Design Tools

  • Adobe Illustrator - Industry-standard vector graphics editor for refining and customizing charts.
  • Figma - Collaborative design tool increasingly used for creating and prototyping data visualizations.
  • RAWGraphs - Open-source tool for creating uncommon chart types from spreadsheet data.

5. Publishing and Presenting Data Stories

The final stage involves packaging your data story for publication and creating interactive elements that engage your audience.

Interactive Story Platforms

  • Scrollytelling - Tools like Shorthand and Flourish Story allow creation of scroll-driven narratives with data visualizations.
  • ArcGIS StoryMaps - Combines maps with narrative text and other content for place-based storytelling.
  • Idyll - Open-source presentation tool for creating interactive data-driven articles.

Embedding and Integration Tools

  • Pym.js - JavaScript library that helps embed responsive iframes, useful for integrating interactive visualizations.
  • Observable - Platform for creating and sharing data visualizations with live code and explanations.
  • GitHub Pages - Free hosting for web-based data visualizations and interactive stories.
Interactive data story on multiple devices
Modern data stories must work across devices and engage readers through interactive elements

Building Your Data Journalism Toolkit

With so many options available, how should journalists approach building their personal toolkit? Consider these recommendations based on experience level:

For Beginners

Start with these accessible tools that offer quick results with minimal technical expertise:

  • Data Acquisition: Google Dataset Search, government open data portals
  • Cleaning: Google Sheets, Excel with Power Query
  • Analysis: Google Sheets, Excel (focusing on sorting, filtering, and pivot tables)
  • Visualization: Datawrapper, Flourish
  • Publishing: Embedding options within your CMS

For Intermediate Data Journalists

As you grow more comfortable, expand your toolkit to include:

  • Data Acquisition: Add web scraping tools like ParseHub or Data Miner
  • Cleaning: OpenRefine, Tabula for PDF extraction
  • Analysis: Tableau Public, SQL basics
  • Visualization: Tableau Public, QGIS for mapping
  • Publishing: Scrollytelling platforms like Shorthand

For Advanced Practitioners

Those ready to invest in programming skills can leverage the most powerful tools:

  • Data Acquisition: API access, custom web scrapers with Python
  • Cleaning: Python (Pandas) or R (tidyverse) workflows
  • Analysis: Statistical modeling in R or Python
  • Visualization: D3.js, R's ggplot2, or Python visualization libraries
  • Publishing: Custom interactive applications

"The tools are means to an end, not the end itself. Choose tools that let you focus on the story the data is telling, not tools that become a story in themselves through their complexity."

— Alberto Cairo, Knight Chair in Visual Journalism at the University of Miami

Learning Resources

Developing data journalism skills requires ongoing learning. These resources can help you master the tools mentioned above:

Online Courses and Tutorials

  • Knight Center for Journalism in the Americas - Offers free and low-cost online courses specifically focused on data journalism.
  • DataCamp - Interactive courses on R, Python, SQL, and data visualization.
  • Journalism Academy's Data Journalism Course - Comprehensive training covering the entire data journalism workflow.

Communities and Forums

  • NICAR-L Mailing List - Active community of data journalists sharing tips and resources.
  • Data Journalism Slack communities - Many regional and global groups for data journalists.
  • Stack Overflow - For programming-specific questions related to data analysis.

Books and Guides

  • The Data Journalism Handbook - Comprehensive overview of principles and practices.
  • R for Data Science - Guide to data analysis with R programming.
  • Python for Data Analysis - Practical introduction to Python for data work.

Ethics and Best Practices

As you work with data tools, remember these important considerations:

Transparency

Document your data sources, cleaning methods, and analysis techniques. Consider publishing your methodology and, when possible, your data and code.

Context and Nuance

Tools can generate impressive visualizations, but ensure they accurately represent the underlying data and include appropriate context.

Privacy and Security

Be mindful of privacy implications when working with data that might identify individuals, especially vulnerable populations.

Accessibility

Ensure your visualizations are accessible to people with disabilities and provide alternative text descriptions.

Conclusion

The data journalism tools landscape continues to evolve rapidly, with new options emerging regularly. Rather than trying to master every tool, develop expertise with a core set that covers your workflow needs. Start simple and expand your toolkit as your skills and project requirements grow.

Remember that the most important element isn't the tool itself but the journalistic questions you're asking of the data. The best tools are those that fade into the background, enabling you to find and tell compelling, accurate stories hidden within the numbers.

Share this article:

Comments (4)

Jessica Kim

Jessica Kim

April 16, 2023 at 9:22 AM

This is incredibly thorough - thank you! As someone just starting in data journalism, the beginner recommendations are particularly helpful. I've been struggling with cleaning PDFs, so I'll definitely try Tabula.

Thomas Morgan

Thomas Morgan

April 17, 2023 at 2:45 PM

I'd add that for teams working across different skill levels, establishing a common workflow is crucial. We've had success with a hybrid approach: technical team members using Python/R for heavy lifting, then exporting to formats that can be used by team members who prefer visual tools.

Marco Lee

Marco Lee (Author)

April 17, 2023 at 4:10 PM

@Jessica - Glad you found it helpful! Tabula is definitely a game-changer for PDF data extraction. Feel free to reach out if you run into specific challenges.

@Thomas - Excellent point about hybrid workflows. That collaborative approach is something we emphasize in our data journalism course as well. Creating handoff points between team members with different technical skills can really enhance a newsroom's data capabilities.

Leave a Comment

Master Data Journalism Skills

Join our Data Journalism and Visualization course to learn practical techniques from industry experts.