no code implementations • 30 Mar 2021 • Siyu Zhou, Lucas Mentch
Due to their long-standing reputation as excellent off-the-shelf predictors, random forests continue remain a go-to model of choice for applied statisticians and data scientists.
no code implementations • 23 Feb 2021 • Lucas Mentch, Giles Hooker
In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures.
1 code implementation • ACL 2020 • Taehee Jung, Dongyeop Kang, Hua Cheng, Lucas Mentch, Thomas Schaaf
Here we propose an end-to-end training procedure called posterior calibrated (PosCal) training that directly optimizes the objective while minimizing the difference between the predicted and empirical posterior probabilities. We show that PosCal not only helps reduce the calibration error but also improve task performance by penalizing drops in performance of both objectives.
no code implementations • 7 Mar 2020 • Lucas Mentch, Siyu Zhou
As the size, complexity, and availability of data continues to grow, scientists are increasingly relying upon black-box learning algorithms that can often provide accurate predictions with minimal a priori model specifications.
1 code implementation • 2 Dec 2019 • Zhengze Zhou, Lucas Mentch, Giles Hooker
This paper develops a general framework for analyzing asymptotics of $V$-statistics.
1 code implementation • 1 Nov 2019 • Lucas Mentch, Siyu Zhou
Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings.
1 code implementation • IJCNLP 2019 • Taehee Jung, Dongyeop Kang, Lucas Mentch, Eduard Hovy
We find that while position exhibits substantial bias in news articles, this is not the case, for example, with academic papers and meeting minutes.
1 code implementation • 27 Aug 2019 • Tim Coleman, Kimberly Kaufeld, Mary Frances Dorn, Lucas Mentch
To estimate these ratios with an unlabeled test set, we make the covariate shift assumption, where the differences in distribution are only a function of the training distributions (Shimodaira, 2000.)
no code implementations • 25 May 2019 • Wei Peng, Tim Coleman, Lucas Mentch
Random forests remain among the most popular off-the-shelf supervised learning algorithms.
1 code implementation • 1 May 2019 • Giles Hooker, Lucas Mentch, Siyu Zhou
This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions.
2 code implementations • 16 Apr 2019 • Tim Coleman, Wei Peng, Lucas Mentch
Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods.
no code implementations • 29 Jun 2016 • Duy Hoang Thai, Lucas Mentch
Segmentation remains an important problem in image processing.
no code implementations • 1 Jun 2015 • Giles Hooker, Lucas Mentch
This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods.
no code implementations • 7 Jun 2014 • Lucas Mentch, Giles Hooker
While statistical learning methods have proved powerful tools for predictive modeling, the black-box nature of the models they produce can severely limit their interpretability and the ability to conduct formal inference.
no code implementations • 25 Apr 2014 • Lucas Mentch, Giles Hooker
Instead of aggregating full bootstrap samples, we consider predicting by averaging over trees built on subsamples of the training set and demonstrate that the resulting estimator takes the form of a U-statistic.