Econometrics¶
There are two sets of tools for econometrics: statsmodels, quantecon, and (for bayesians) stan.
Stats with StatsModels¶
statsmodels is the go-to library for doing econometrics (linear regression, logit regression, etc.).
You can find a good tutorial here, and a brand new book built around statsmodels here (with lots of example code here).
The most important things are also covered on the statsmodel page here, especially the pages on OLS here and here.
(If you want to do machine learning, by the way, or want to know the difference between statsmodels
and the machine learning library scikit-learn
, head over to Machine Learning with scikit-learn)
Here are some simple illustrative examples of standard OLS:
On with the show:
# Load pandas and statsmodels
In [1]: import pandas as pd
In [2]: import statsmodels.formula.api as smf
# Load a csv dataset of World Development Indicators
In [3]: my_data = pd.read_csv('wdi_indicators.csv')
# Look at first three lines
In [4]: my_data.head(3)
Out[4]:
year country_name country_code gdp_per_cap literacy_rate \
0 2011 Afghanistan AFG 1712.588720 31.741117
1 2011 Albania ALB 9640.130216 96.845299
2 2011 Algeria DZA 12964.827210 NaN
life_expectancy population_density region
0 60.065366 44.127634 NaN
1 77.163220 106.013869 NaN
2 70.751683 15.416096 NaN
# OLS
In [5]: results = smf.ols('life_expectancy ~ population_density + gdp_per_cap',
...: data=my_data).fit()
...:
In [6]: print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: life_expectancy R-squared: 0.378
Model: OLS Adj. R-squared: 0.372
Method: Least Squares F-statistic: 65.10
Date: Sun, 07 Aug 2016 Prob (F-statistic): 8.23e-23
Time: 09:27:24 Log-Likelihood: -734.50
No. Observations: 217 AIC: 1475.
Df Residuals: 214 BIC: 1485.
Df Model: 2
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [95.0% Conf. Int.]
--------------------------------------------------------------------------------------
Intercept 65.0441 0.660 98.610 0.000 63.744 66.344
population_density -0.0008 0.000 -2.033 0.043 -0.002 -2.38e-05
gdp_per_cap 0.0003 2.75e-05 11.023 0.000 0.000 0.000
==============================================================================
Omnibus: 44.979 Durbin-Watson: 2.081
Prob(Omnibus): 0.000 Jarque-Bera (JB): 66.635
Skew: -1.226 Prob(JB): 3.39e-15
Kurtosis: 4.166 Cond. No. 3.55e+04
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.55e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
# Categorical Vars are easy
# Make categorical var
In [7]: my_data['low_income'] = my_data['gdp_per_cap'] < 4000
In [8]: results2 = smf.ols('life_expectancy ~ population_density + gdp_per_cap + C(low_income)', data=my_data).fit()
In [9]: print(results2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: life_expectancy R-squared: 0.580
Model: OLS Adj. R-squared: 0.574
Method: Least Squares F-statistic: 97.92
Date: Sun, 07 Aug 2016 Prob (F-statistic): 7.27e-40
Time: 09:27:24 Log-Likelihood: -692.02
No. Observations: 217 AIC: 1392.
Df Residuals: 213 BIC: 1406.
Df Model: 3
Covariance Type: nonrobust
=========================================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-----------------------------------------------------------------------------------------
Intercept 69.7352 0.715 97.543 0.000 68.326 71.144
C(low_income)[T.True] -10.6282 1.052 -10.103 0.000 -12.702 -8.555
population_density -0.0003 0.000 -0.920 0.358 -0.001 0.000
gdp_per_cap 0.0002 2.57e-05 7.022 0.000 0.000 0.000
==============================================================================
Omnibus: 75.439 Durbin-Watson: 2.145
Prob(Omnibus): 0.000 Jarque-Bera (JB): 205.379
Skew: -1.527 Prob(JB): 2.53e-45
Kurtosis: 6.659 Cond. No. 7.67e+04
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.67e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
# Heteroskedastic-Robust Standard Errors
In [10]: results2_robust = results2.get_robustcov_results()
In [11]: print(results2_robust.summary())
OLS Regression Results
==============================================================================
Dep. Variable: life_expectancy R-squared: 0.580
Model: OLS Adj. R-squared: 0.574
Method: Least Squares F-statistic: 83.84
Date: Sun, 07 Aug 2016 Prob (F-statistic): 7.43e-36
Time: 09:27:24 Log-Likelihood: -692.02
No. Observations: 217 AIC: 1392.
Df Residuals: 213 BIC: 1406.
Df Model: 3
Covariance Type: HC1
=========================================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-----------------------------------------------------------------------------------------
Intercept 69.7352 0.918 75.969 0.000 67.926 71.545
C(low_income)[T.True] -10.6282 1.203 -8.832 0.000 -13.000 -8.256
population_density -0.0003 0.000 -0.851 0.396 -0.001 0.000
gdp_per_cap 0.0002 3.96e-05 4.564 0.000 0.000 0.000
==============================================================================
Omnibus: 75.439 Durbin-Watson: 2.145
Prob(Omnibus): 0.000 Jarque-Bera (JB): 205.379
Skew: -1.527 Prob(JB): 2.53e-45
Kurtosis: 6.659 Cond. No. 7.67e+04
==============================================================================
Warnings:
[1] Standard Errors are heteroscedasticity robust (HC1)
[2] The condition number is large, 7.67e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
# Output to LaTeX
In [12]: latex = results2_robust.summary().as_latex()
In [13]: latex
Out[13]: '\\begin{center}\n\\begin{tabular}{lclc}\n\\toprule\n\\textbf{Dep. Variable:} & life_expectancy & \\textbf{ R-squared: } & 0.580 \\\\\n\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.574 \\\\\n\\textbf{Method:} & Least Squares & \\textbf{ F-statistic: } & 83.84 \\\\\n\\textbf{Date:} & Sun, 07 Aug 2016 & \\textbf{ Prob (F-statistic):} & 7.43e-36 \\\\\n\\textbf{Time:} & 09:27:24 & \\textbf{ Log-Likelihood: } & -692.02 \\\\\n\\textbf{No. Observations:} & 217 & \\textbf{ AIC: } & 1392. \\\\\n\\textbf{Df Residuals:} & 213 & \\textbf{ BIC: } & 1406. \\\\\n\\textbf{Df Model:} & 3 & \\textbf{ } & \\\\\n\\bottomrule\n\\end{tabular}\n\\begin{tabular}{lccccc}\n & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$>$$|$t$|$} & \\textbf{[95.0\\% Conf. Int.]} \\\\\n\\midrule\n\\textbf{Intercept} & 69.7352 & 0.918 & 75.969 & 0.000 & 67.926 71.545 \\\\\n\\textbf{C(low_income)[T.True]} & -10.6282 & 1.203 & -8.832 & 0.000 & -13.000 -8.256 \\\\\n\\textbf{population_density} & -0.0003 & 0.000 & -0.851 & 0.396 & -0.001 0.000 \\\\\n\\textbf{gdp_per_cap} & 0.0002 & 3.96e-05 & 4.564 & 0.000 & 0.000 0.000 \\\\\n\\bottomrule\n\\end{tabular}\n\\begin{tabular}{lclc}\n\\textbf{Omnibus:} & 75.439 & \\textbf{ Durbin-Watson: } & 2.145 \\\\\n\\textbf{Prob(Omnibus):} & 0.000 & \\textbf{ Jarque-Bera (JB): } & 205.379 \\\\\n\\textbf{Skew:} & -1.527 & \\textbf{ Prob(JB): } & 2.53e-45 \\\\\n\\textbf{Kurtosis:} & 6.659 & \\textbf{ Cond. No. } & 7.67e+04 \\\\\n\\bottomrule\n\\end{tabular}\n%\\caption{OLS Regression Results}\n\\end{center}'
# Save to disk
In [14]: with open("regression_table.tex", "w") as text_file:
....: text_file.write(latex)
....:
QuantEcon¶
QuantEcon is a new library specifically for economists with some tools not found in statsmodels. A full index is here
PyStan¶
PyStan is the Python interface for the Stan library – a set of tools for statisticians, especially bayesians. You can find resources on Stan in general here, and PyStan in particular here .