Welcome to Data Analysis in Python!

Python is an increasingly popular tool for data analysis. In recent years, a number of libraries have reached maturity, allowing R and Stata users to take advantage of the beauty, flexibility, and performance of Python without sacrificing the functionality these older programs have accumulated over the years.

This site is designed to offer an introduction to Python specifically tailored for social scientists and people doing applied data analysis – users with little or no serious programming experience who just want to get things done, and who have experience with programs like R and Stata but are anxious for something better.

NOTE TO USERS: In recent months, I've been finding myself switching more and more from Python into a new language called Julia. Julia is not without its weaknesses -- most importantly, it's relatively young, so the package eco-system isn't quite a well-developed as the Python eco-system -- but it was designed from the ground up for data analysis in a way Python was not.  There are also a number of aspects I think are really appealing for researchers which I outline in a talk I gave at Vanderbilt here (sorry for the so-so audio). And if you are already familiar with Python, you'll find it pretty easy to pick up (Julia was clearly written by Python, R, and Matlab programmers). Anyway, it may be a little young for some people, but since it's what I'm using more and more, I thought I should mention it here!


  1. Core Skill Sequence: A collection of four numbered tutorials that cover core skills everyone needs to work in Python in social science. I recommend you visit these in sequence – a site for setting up Python on your computer using the Anaconda distribution, an intro to Python for those not familiar with the language, an introduction to the pandas library for working with tabular data (analogous to data.frames in R, or everything you ever did in Stata), and a guide to installing libraries to expand Python.
  2. Specific Resources for Different Research Topics: “topic” pages, which you should feel free to jump through as appropriate for your purposes: statsmodels, quantecon, and stan for econometrics, machine learning with scikit-learn, seaborn and ggplot for graphing, network analysis using igraph, geo-spatial analysis, ways to accelerate Python, big data tools, and text analysis libraries. The topic pages also include two topics that are a little unusual, but I think potentially quite useful: guide to getting effective help online, and resources on evidence-based research on how to teach programming for anyone teaching this material.
  3. Resources for Other Software Tools: Resources on tools and programs you may come across while using Python with descriptions of the tool, guidance on what you need to know most, and links to other tutorials. These include pages on the Command Line, iPython, and Git and Github.

Ready to get started? Head on over to Setup!

Question or comments? Please send them my way! Feedback of all sorts is greatly appreciated, and if you have any experience with github, suggested changes to this site can also be submitted as pull-requests here Contents: