Machine learningΒΆ

The primary library for Machine Learning in Python is scikit-learn, which has its own great tutorial page here.

If you’re wondering about the difference between statsmodels and scikit-learn, the answer is: there’s no easy answer.

statsmodels is primarily written for and by econometricians, while scikit-learn is primarily written for and by computer scientists and people doing machine learning. But the relationship between “econometrics” and “machine learning” is complicated. In very broad terms, machine learning tends to focus on prediction while econometrics tends to focus on testing hypotheses. But that’s somewhat simplistic.

The reason is that Econometrics and Machine Learning both developed when people in specific disciplines (economics and computer science respectively) branched off statistics to develop tools tailored for their own area. For several decades, econometrics and machine learning more or less developed independently and in parallel, each borrowing from statistics, but neither really paying attention to the other. As a result, there are some places where the two fields use the same tools but refer to them with different nomenclature, and other places where they actually do fundamentally different things.