Paper

Stochastic Learning of Semiparametric Monotone Index Models with Large Sample Size

I study the estimation of semiparametric monotone index models in the scenario where the number of observation points $n$ is extremely large and conventional approaches fail to work due to heavy computational burdens. Motivated by the mini-batch gradient descent algorithm (MBGD) that is widely used as a stochastic optimization tool in the machine learning field, I proposes a novel subsample- and iteration-based estimation procedure. In particular, starting from any initial guess of the true parameter, I progressively update the parameter using a sequence of subsamples randomly drawn from the data set whose sample size is much smaller than $n$. The update is based on the gradient of some well-chosen loss function, where the nonparametric component is replaced with its Nadaraya-Watson kernel estimator based on subsamples. My proposed algorithm essentially generalizes MBGD algorithm to the semiparametric setup. Compared with full-sample-based method, the new method reduces the computational time by roughly $n$ times if the subsample size and the kernel function are chosen properly, so can be easily applied when the sample size $n$ is large. Moreover, I show that if I further conduct averages across the estimators produced during iterations, the difference between the average estimator and full-sample-based estimator will be $1/\sqrt{n}$-trivial. Consequently, the average estimator is $1/\sqrt{n}$-consistent and asymptotically normally distributed. In other words, the new estimator substantially improves the computational speed, while at the same time maintains the estimation accuracy.

Results in Papers With Code
(↓ scroll down to see all results)