Statistical inference in massive data sets

Analysis of massive data sets is challenging owing to limitations of computer primary memory. In this paper, we propose an approach to estimate population parameters from a massive data set. The proposed approach significantly reduces the required amount of primary memory, and the resulting estimate will be as efficient if the entire data set was analyzed simultaneously. Asymptotic properties of the resulting estimate are studied, and the asymptotic normality of the resulting estimator is established. The standard error formula for the resulting estimate is proposed and empirically tested; thus, statistical inference for parameters of interest can be performed. The effectiveness of the proposed approach is illustrated using simulation studies and an Internet traffic data example.

Files

Metadata

Work Title Statistical inference in massive data sets
Access
Open Access
Creators
  1. Runze Li
  2. Dennis K.J. Lin
  3. Bing Li
Keyword
  1. Internet traffic data
  2. Kernel density estimation
  3. Remedian
License In Copyright (Rights Reserved)
Work Type Article
Publisher
  1. Applied Stochastic Models in Business and Industry
Publication Date July 5, 2012
Publisher Identifier (DOI)
  1. https://doi.org/10.1002/asmb.1927
Deposited July 19, 2022

Versions

Analytics

Collections

This resource is currently not in any collection.

Work History

Version 1
published

  • Created
  • Added Appl_Stoch_Models_Bus_Ind_-_2012_-_Li_-_Statistical_inference_in_massive_data_sets.pdf
  • Added Creator Runze Li
  • Added Creator Dennis K.J. Lin
  • Added Creator Bing Li
  • Published
  • Updated Keyword, Publication Date Show Changes
    Keyword
    • Internet traffic data, Kernel density estimation, Remedian
    Publication Date
    • 2013-09-01
    • 2012-07-05
  • Updated