Extrapolation-based tuning parameters selection in massive data analysis (大规模数据分析中基于外推的调节参数选取)
Many statistical modeling procedures involve one or more tuning parameters to control the model complexity. These tuning parameters can be the bandwidth in the kernel smoothing method in the nonparametric regression and density estimation or be the regularization parameter in the regularization method for feature selection in the high dimensional modeling. Tuning parameter selection plays critical roles in the statistical modeling and machine learning. For the massive data analysis, commonly-used methods such as grid-point search with information criteria become prohibitively costly in computation. Their feasibility is questionable even with modern parallel computing platforms. This paper aims to develop a fast algorithm to efficiently approximate the best tuning parameters. The algorithm entails (a) assuming a parametric model to describe the trend between the best tuning parameters and sample sizes, (b) establishing the trend via fitting the model with subsampling data, and (c) extrapolating this trend to the case of huge sample size. To determine the subsampling sample sizes to be taken, we derive optimal designs for settings that allow a constraint on the budget of total computational cost. We show that the proposed designs possess an asymptotic optimality property. Our numerical studies demonstrate that with a simple two-parameter polynomial model, the proposed algorithm performs almost equivalently to the procedure using the full data set in several different statistical settings, while it has a significant reduction in computing time and storage.
Files
Metadata
Work Title | Extrapolation-based tuning parameters selection in massive data analysis (大规模数据分析中基于外推的调节参数选取) |
---|---|
Access | |
Creators |
|
License | In Copyright (Rights Reserved) |
Work Type | Article |
Publisher |
|
Publication Date | June 1, 2022 |
Language |
|
Publisher Identifier (DOI) |
|
Deposited | March 29, 2023 |
Versions
Analytics
Collections
This resource is currently not in any collection.