Topic modeling data for research data management and data curation abstracts

Research articles on research data management and data curation were collected on December 19, 2018 through the Library Literature and Information Science Full Text database. The term “research data management” (in quotations) retrieved a set of 106 scholarly articles in which the search term appeared, as a phrase, somewhere in the article’s metadata (e.g., in the title, abstract, or keywords); “data curation” (in quotations) yielded a set of 111 scholarly articles– this character string also appeared somewhere in the article’s metadata. A publicly accessible version of this search is available through our Zotero group. Fifteen of the same articles were found in both the “research data management” and the “data curation” sets. These were left for analysis in both sets. Results were limited to include scholarly (peer-reviewed) journals (i.e., no professional journals) and articles (i.e., no book reviews, and no conference papers). The database search returned 217 articles in total, all of which included an abstract.

The topics were isolated for analysis using the MAchine Learning for LanguagE Toolkit (MALLET) implementation of LDA. MALLET is a package that includes statistical natural language processing, document classification, clustering, topic modeling, and information extraction applications for text analysis. For this analysis, the following MALLET specifications were used: - 35 number of topics - Remove stopwords - No. of iterations: 200 - No. of topic words printed: 20 - Topic proportion threshold: 0.05

Works

article
Creators
  1. Cynthia Vitale
Deposited January 29, 2019
Access
Open Access
article
Creators
  1. Cynthia Vitale
Deposited February 04, 2019
Access
Open Access

Metadata

Title Topic modeling data for research data management and data curation abstracts
Creator
  1. Cynthia Vitale
Keyword
  1. research data management
  2. data curation
  3. topic modeling
Deposited at January 29, 2019

Analytics