Extrapolation-based tuning parameters selection in massive data analysis (大规模数据分析中基于外推的调节参数选取)

Many statistical modeling procedures involve one or more tuning parameters to control the model complexity. These tuning parameters can be the bandwidth in the kernel smoothing method in the nonparametric regression and density estimation or be the regularization parameter in the regularization method for feature selection in the high dimensional modeling. Tuning parameter selection plays critical roles in the statistical modeling and machine learning. For the massive data analysis, commonly-used methods such as grid-point search with information criteria become prohibitively costly in computation. Their feasibility is questionable even with modern parallel computing platforms. This paper aims to develop a fast algorithm to efficiently approximate the best tuning parameters. The algorithm entails (a) assuming a parametric model to describe the trend between the best tuning parameters and sample sizes, (b) establishing the trend via fitting the model with subsampling data, and (c) extrapolating this trend to the case of huge sample size. To determine the subsampling sample sizes to be taken, we derive optimal designs for settings that allow a constraint on the budget of total computational cost. We show that the proposed designs possess an asymptotic optimality property. Our numerical studies demonstrate that with a simple two-parameter polynomial model, the proposed algorithm performs almost equivalently to the procedure using the full data set in several different statistical settings, while it has a significant reduction in computing time and storage.

Files

Metadata

Work Title Extrapolation-based tuning parameters selection in massive data analysis (大规模数据分析中基于外推的调节参数选取)
Access
Open Access
Creators
  1. Haojie Ren
  2. Changliang Zou
  3. Runze Li
License In Copyright (Rights Reserved)
Work Type Article
Publisher
  1. Scientia Sinica Mathematica
Publication Date June 1, 2022
Language
  1. Chinese
Publisher Identifier (DOI)
  1. https://doi.org/10.1360/SCM-2020-0622
Deposited March 29, 2023

Versions

Analytics

Collections

This resource is currently not in any collection.

Work History

Version 1
published

  • Created
  • Added 2AAA234E4B784D9EB3C84530CA4DB7DE.pdf
  • Added Creator Haojie Ren
  • Added Creator Changliang Zou
  • Added Creator Runze Li
  • Published
  • Updated Work Title Show Changes
    Work Title
    • 大规模数据分析中基于外推的调节参数选取
    • Extrapolation-based tuning parameters selection in massive data analysis (大规模数据分析中基于外推的调节参数选取)
  • Updated Language Show Changes
    Language
    • Chinese
  • Updated