Book list for ensemble learning and stacking

I have been studying blending, stacking machine learning models for a while, as written in previous post: [note] Ensembling multiple machine learning models. I have applied blending - extended from general version to my recent Kaggle competitions, however, it didn’t perform as good as I had expected.

To strengthen both the theory and practice about ensembling, stacking, blending, I conducted some survey online and added these two books to my to-read list:

Ensemble Methods: Foundations and Algorithms
Author: Zhi-Hua Zhou, 2012

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition
Authors: Ian H. Witten, Eibe Frank, Mark A. Hall, 2011

People’s recommendations

“The authors provide enough theory to enable practical application, and it is this practical focus that separates this book from most, if not all, other books on this subject.” -Dorian Pyle, Director of Modeling at Numetrics

张垚 from ZHIHU: 周志华老师的Ensemble Methods: Foundations and Algorith ,我正在读这本书,难度略高, 不过读起来很过瘾。

wepon from ZHIHU: Data Mining:Practical Machine Learning Tools and Techniques - 第八章认真读一读,特别是Further Reading中提到的那些作者,论文。然后比赛实战,看榜首大牛们的solution,自己照着做一两次。

Reference / Additional Information

ZHIHU - 请问学习 ensemble learning 要从哪里开始呢?
Kaggle Forum - Blended Ensemble - Why it never works for me ?

[note] Launch AWS EC2 instance for Kaggle competition

I had been using one single laptop - mid-2012 MBP for participating kaggle competitions for a while.

In the last few days of BNP Paribas competition on Kaggle, I was limited by the computation speed (each XGBoost or RF took nearly one hour) to try out my ideas. Therefore, I went for AWS EC2 in order to acquire more firepower, and I was satisfied with the final result - my first Top 10% in the competition. It really helped me a lot.

Let me take a note here about how I configured a ubuntu-based m4.2xlarge instance with 8-core, 32 GB memory and installed required python libraries including sklearn, pandas, scipy, numpy.


Read More

[note] MySQL: reverse version of LIKE


We all know that LIKE help us to find all the rows with certain fields contains specific keywords.


What if we need to do the opposite? i.e., to find rows that are CONTAINED in our query text
Yeah, it’s my first time running into this kind of use case…


SELECT name FROM my_table 
WHERE 'John Smith and Peter Johnson are best friends' LIKE
  CONCAT('%', name, '%')

and then you will get all rows contain John, Peter.

Another option is to use REGEX:

SELECT  name 
FROM    my_table 
WHERE   'John Smith and Peter Johnson are best friends' REGEXP name;


Stackoverflow - MySQL: What is a reverse version of LIKE?