Automated Text Classification of News Articles: A Practical Guide

Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.

Files

Metadata

Work Title Automated Text Classification of News Articles: A Practical Guide
Access
Open Access
Creators
  1. Pablo Barberá
  2. Amber E. Boydstun
  3. Suzanna Linn
  4. Ryan Mcmahon
  5. Jonathan Nagler
Keyword
  1. Automated text analysis methods
  2. Automated Text Classification
  3. Practical Guide
  4. News
License Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Work Type Article
Publisher
  1. Political Analysis
Publication Date June 9, 2020
Publisher Identifier (DOI)
  1. https://doi.org/10.1017/pan.2020.8
Deposited July 23, 2020

Versions

Analytics

Collections

This resource is currently not in any collection.

Work History

Version 1
published

  • Created
  • Added Suzanna_Linn-Automated_Text_Classifications.pdf
  • Added Creator Pablo Barberá
  • Added Creator Amber E. Boydstun
  • Added Creator Suzanna Linn
  • Added Creator Ryan Mcmahon
  • Added Creator Jonathan Nagler
  • Published