Statistical Modeling VS Machine Learning

20 Sep 2014

Common Mistakes

As a person who starts diving into machine learning field with a statistics background, one thing always confuse me: What’s the difference between statistical modeling and machine learning. Initially, I thought it was pretty straightforward. Every stats nerd knows there’re two kinds of models in the sense of statistical modeling: estimation vs prediction model. Estimation model cares about how exactly are explanatory variables associated with outcome, whereas prediction model just seeks the best prediction outcome. In this sense, it looks like machine learning is the same as statistical predictive modeling. Unfortunately, that’s not the case.

External Difference

"Machine learning is statistics minus any checking of models and assumptions."
— Brian D. Ripley(Brian, 2004)

One simple example is how machine learning people and statistics people talk about linear regression differently. Statistics people won’t let you use the model unless model assumptions fit, but you probably will never ever hear about that from machine learning people.

Internal Difference

Further exploration into this issue reveals that the fundamental differences between statistical modeling and machine learning actually arise from two different modeling cultures(Breiman, 2001).

Traditional statistical methods are more likely to adopt a data modeling perspective. For example, the analysis might start by proposing a stochastic data model, or linear regression model, for the 'black box’ between predictors and outcome. A common data model framework is that data are generated by independent draws from:

Response variables = f(predictor variables, random noise, parameters)

The values of the parameters are estimated from the data and model and then used to both better understand the process that resulted in the data and for prediction (e.g.,: linear regression, logistic regression, the Cox model, etc).

In contrast, the algorithmic modeling culture is popular in machine learning. The analysis here considers the inside of the black box as complex and unknown, and not necessarily of interest. The approach is to find a function f(x) — an algorithm that operates on data x - to predict responses y (e.g., decision trees, boosting, neural nets, etc).

Summary

statistics people believe that the data you saw are drawn by some kind of “rule”. The scientific name of this “rule” is distribution. Once you settled on one kind of distribution, the process of statistical modeling is to find out the parameters of the distribution. Therefore, one of the most important thing in statistics people’s mind is to check model assumption, which makes sure the data are drawn from that specific distribution. However, machine learning people don’t care about distribution, and believe data are drawn from a complex and unknown black box by some rules that we never know. They focus on giving the best prediction rather than figuring out how the black box work.

References