Dataset for Topic Modeling and Facet Analysis in an Emerging Domain: Research Data Management and Data Curation

Research data management (RDM) “concerns the organization of data, from its entry to the research lifecycle through to the dissemination and archiving of valuable results”, and is related to the field of data curation (DC), a subset of digital curation. Together, RDM and DC (RDM/DC) allow information professionals to work with clients and each other to make data available in support of the research enterprise. An emerging area of scholarly communication, RDM/DC represents a rich area of study from the perspective of knowledge organization (KO). This paper explores the following research question: What can facet analysis tell us about the emerging field of RDM/DC? First, the MAchine Learning for LanguagE Toolkit (MALLET) implementation of Latent Dirichlet Allocation (LDA) is used for topic modelling of abstracts of the RDM/DC scholarly literature. A preliminary analysis of this empirical data by the research team yields a number of topics and, when possible, their relevant aspects or contexts. Facet analysis principles are next applied to these results, producing four general facets: Practice, Stakeholders, Resources, and Study of RDM/DC; complex notions infused throughout the field such as “services” and “metadata” do not appear outright, however, in the analysis. Each facet is then further explored through logical division, and the resulting system is encoded in Protégé and visualized using WebVOWL. We conclude that the major areas of emphasis in this data-intensive field will be fundamentally of interest to those in LIS, in scholarly communication, and perhaps increasingly, in KO and other fields that manage and make available data of all kinds.



Work Title Dataset for Topic Modeling and Facet Analysis in an Emerging Domain: Research Data Management and Data Curation
Open Access
  1. Cynthia Vitale
  2. Heather Moulaison Sandy
  3. Denice Adkins
  4. Heather Gayle Froehlich
  1. research data management
  2. topic modeling
  3. research data curation
  4. knowledge organization
License CC BY 4.0 (Attribution)
Work Type Dataset
DOI doi:10.26207/hrz9-9963
Deposited May 16, 2019




This resource is currently not in any collection.

Work History

Version 1

  • Created
  • Added RD_Topics_Words-dca.xls
  • Added Creator Cynthia Vitale
  • Added Creator Heather Moulaison Sandy
  • Added Creator Denice Adkins
  • Added Creator Heather Gayle Froehlich
  • Published