Study comparing Google Scholar to Compendex and Scopus using engineering dissertations Public

This study uses engineering dissertations from Proquest Dissertations & Theses to create a data set of citations for the comparison of fee-based databases, Compendex and Scopus, against Google Scholar. From 1950 to 2017 Google Scholar outperformed both Compendex and Scopus in discoverability of citations in nine engineering subjects. The "readme.txt" file explains our methods and data The file "GoogleScholarData.csv" contains 20 randomly sampled from each of the 9 subjects over 7 decades (1260 data records) The script "parse_dissertation_citiations_html.R" extracted the data from the Proquest website.

README

Title: Study comparing Google Scholar to Compendex and Scopus using engineering dissertations

Creators: Carmen Cole ccc143@psu.edu, Penn State University
Angela R. Davis ard21@psu.edu, Penn State University
Vanessa Eyer vld5011@psu.edu, Penn State University
John J. Meier jjm38@psu.edu, Penn State University

Contributor: Robert Olendorf rko5039@psu.edu, Penn State University

Abstract: This study compared the completeness of engineering dissertation citations in Google Scholar, Compendex, and Scopus. The researchers searched
in Proquest Digital Dissertations as a source, searching only for 2016 doctoral dissertations in the U.S. limiting full text reference lists only
Subject headings for 9 engineering disciplines were used to search. From the results, the URLs of all reference lists were saved

Description:
The data were retrieved using a R script https://github.com/olendorf/parsedthesiscitations
The folder "data/web_pages/" contains all HTML pages downloaded from Proquest
The folder "data/" contains a .csv file of all citations with the four digital numeral extracted by the R script

The file "GoogleScholarData.csv" contains 20 randomly sampled from each of the 9 subjects over 7 decades (1260 data records)
Each record in the file is a comma deliminated data element containing the following fields

Data Dictionary:

Unique is a unique identifier for each record
Subject is the engineering field of the dissertation
Citation is a full text citation in quotes "" citation from a randomly sampled dissertation in the subject
Year is the year extracted by a script from the record (not error checked)
Format is the type of reference (Book, Conference, Journal, Other) assigned by the researchers
Google Scholar contains a F if the citation was found in Google Scholar, C for a partial record, and N for not found
Compendex contains a F if the citation was found in Compendex, C for a partial record, and N for not found
Scopus contains a F if the citation was found in Scopus, C for a partial record, and N for not found

Citatation:

To site this data you may use the format

Cole, C., Davis, A.R., Eyer, V., Meier, J.J. 2017. Study comparing Google
Scholar to Compendex and Scopus using engineering dissertations. DOI: doi:10.18113/S1634X

Collections

This Work is not currently in any collections.