[note] Ensembling multiple machine learning models

Ensembling

Blending and Stacked Generalization

https://github.com/MLWave/Kaggle-Ensemble-Guide/blob/master/blend_proba.py
http://mlwave.com/kaggle-ensembling-guide/

Blending has a few benefits: (extracted from above link)

  • It is simpler than stacking.
  • It wards against an information leak: The generalizers and stackers use different data.
  • You do not need to share a seed for stratified folds with your teammates. Anyone can throw models in the ‘blender’ and the blender decides if it wants to keep that model or not.

The cons are:

  • You use less data overall
  • The final model may overfit to the holdout set.
  • Your CV is more solid with stacking (calculated over more folds) than using a single small holdout set.

http://www.r-bloggers.com/caretensemble-classification-example/
https://www.kaggle.com/vivekag/liberty-mutual-group-property-inspection-prediction/ensemble-of-rf-and-xgboost-in-r/run/22021
http://stackoverflow.com/questions/14114046/select-majority-number-of-each-row-in-matrix-using-r
http://amunategui.github.io/blending-models/
http://www.kdnuggets.com/2015/06/ensembles-kaggle-data-science-competition-p3.html

Code: https://github.com/paullo0106/kaggle_pbr/blob/master/blend.py

out-of-fold prediction