Computing for Social Sciences, Spring 2014

Course Information

UPDATED for Spring 2014!

  • Supervisor: James Evans (jevans@), Department of Sociology
  • Instructors: Peter McMahan (mcmahan@) and Michael Castelle (mcc@), Sociology
  • Office Hours: Weekly help session, or by appointment.
  • Meeting day and time: Friday 1:30-4:30pm
  • Location: Pick 016
  • Prerequisites: None
  • Requirements: BYO Laptop1

Learn to scrape, parse, analyze, and visualize data for exploratory analysis and quantitative research. No previous programming knowledge is assumed.

Course Goals and Topics

This is an for-credit applied course focusing on a pragmatic understanding of programming languages and software libraries, specifically oriented towards students in the social sciences and humanities with emerging research projects requiring basic programming skills.

The course's goals include unshackling academic researchers from the constraints of commercial, general-purpose statistics/GIS software and to free them from the limitations of working with pre-existing and pre-formatted data sets. Students in the course will learn to write programs in the interpreted programming language Python and the (open-source) statistical software language/environment R, as well as learning to use databases and to interact with a wide variety of existing software. The simultaneous instruction of two very different principal programming languages is intentional: the course's secondary goals are to demonstrate that data can be created, analyzed and visualized by a diversity of methods, and to encourage students not to be intimidated by unfamiliar computer programming dialects and interfaces. The course will introduce methods required to parse text files, scrape data from other sources, write structured programs for statistical analysis, create and query databases, visualize datasets, and conduct network analysis.

Assignments and Help Sessions

Each weekly meeting will be accompanied by a take-home programming assignment which will be due before the following class. After each class, a help session and tutoring hours will be provided for those requiring extra guidance on the assignment. The programming assignments will often be cumulative and build on one another, so completing the functionality of each assignment is crucial.

This year we will be using Piazza for class discussion. Rather than emailing questions to the teaching staff, we encourage you to post your questions on there.

Spring Schedule

Week 1 (4/4/2014) Operating Systems and Computer Science Basics
Introduction to basic concepts of programming languages and operating systems. Introduction to interpreted programming languages Python and R, with comparison to SPSS/Stata. Introduction to command-line interfaces.
Assignment 1, due 4/10/2014.
Week 2 (4/11/2014)Data types and data structures

Lists, arrays, dictionaries. Vectors and matrices. Operations on these data types in Python and R. Introduction to the scipy data structure library for Python.

Readings to be done before Week 2 meeting:

Assignment 2, due 4/17/2014.

Week 2 slides as PDF.

Week 3 (4/18/2014)Structured programming and code management
Conditional statements, loops, functions, modules. Using a text editor and organizing code in separate files. Introduction to programming styles—functional, imperative, object-oriented.

Readings to be done before Week 3 meeting:

Assignment 3, due 5/2/2014.

Week 3 slides as PDF.

Week 4 (4/25/2014)Input and Output
Storing analyzed data for later reuse (Python's pickle and cPickle module). Reading and writing text files for other programs to use.

To follow along in class, please download this list of stopwords and this collection of H. P. Lovecraft stories.
Week 4 slides as PDF.

Week 5 (5/2/2014)Web scraping and Content analysis
Automating data collection from structured web pages. Rudimentary natural language processing.
For lecture this week you will need the tokenizer you wrote the previous week in class (or you can download or week5_template.R). You will also need the same list of stopwords and collection of H. P. Lovecraft stories.
Week 5 slides as PDF.
Week 6 (5/9/2014)Data management and Databases
Introduction to relational databases, which provide stable and network-accessible storage of medium-to-large (but not very large) datasets. Comparison of spreadsheet interfaces (Excel, SPSS) with relational databases like MySQL. Introduction to MySQLdb library for Python.

Assignment 4, due 5/2/2014.

Week 6 slides as PDF.

SQL books available on Proquest
  • Beaulieu, Alan. Learning SQL, Second Edition, 2009. There's also a copy of this in SS402 if you want to check it out in person.
  • Kline, Kevin. SQL in a Nutshell, 2008. This is mostly just a command reference, which can be useful when you want to understand the many parameters to e.g., ALTER TABLE. The book tries to distinguish between variant syntax for Microsoft SQL (which we hope you won't ever have to use) and MySQL.
  • Molinaro, Anthony. SQL Cookbook, 2005. This book ranges from simple questions "How do I insert a new row" to the more complex, and is a good companion to a tutorial like Learning SQL.

Week 7 (5/16/2014)Plotting, Graphing, Visualization
Visualizing descriptive statistics in R (histograms, box plots, scatter plots, etc.) Introduction to matplotlib for Python.

Week 7 slides as PDF.

Week 8 (5/23/2014)Web development + network analysis
Week 8 slides on web development and Flask.

Week 9 (5/30/2014)Network Analysis
Basics of network representation, manipulation and visualization in Python's networkx library and R's sna, network, igraph.

Week 8/9 slides combined, including web development with Rook.

Week 10 (6/6/2014)TBA



Selected readings will be assigned from the following texts (some available in their entirety online):

Additional Texts

Local Meetups

UPDATED for 2014!

One easy, ethnographically-oriented way to help adapt to contemporary computing culture is to crash one of Chicago's many friendly user groups and meetups, which typically offer free pizza, beer, and PowerPoint presentations in various downtown office spaces and taverns.

1 Students will be required to install Python and R distributions on their own computers.