Computing for the Social Sciences and Humanities, Spring 2013

Workshop Information

UPDATED for April—June 2013!

  • Instructors: Michael Castelle (mcc@), Peter McMahan (mcmahan@), Department of Sociology
  • Office Hours: Weekly help session, or by appointment.
  • Meeting day and time: Thursday 3:00-4:30pm (Except Week 4: Friday 4/26/13, 3-4:30pm)
  • Location: Searle 240A (Except Week 5: Searle 240B)
  • Help Session day and time: Thursday 4:30-6:00pm
  • Prerequisites: None
  • Requirements: BYO Laptop1

Learn to scrape, parse, analyze, and visualize data for exploratory analysis and quantitative research. No previous programming knowledge is assumed.

Workshop Goals and Topics

This is an non-credit applied workshop focusing on a pragmatic understanding of programming languages and software libraries, specifically oriented towards students in the social sciences and humanities with emerging research projects requiring basic programming skills.


The workshop's goals include unshackling academic researchers from the constraints of commercial, general-purpose statistics/GIS software and to free them from the limitations of working with pre-existing and pre-formatted data sets. Students in the workshop will learn to write programs in the interpreted programming language Python and the (open-source) statistical software language/environment R, as well as learning to use databases and to interact with a wide variety of existing software. The simultaneous instruction of two very different principal programming languages is intentional: the workshop's secondary goals are to demonstrate that data can be created, analyzed and visualized by a diversity of methods, and to encourage students not to be intimidated by unfamiliar computer programming dialects and interfaces. The workshop will introduce methods required to parse text files, scrape data from other sources, write structured programs for statistical analysis, create and query databases, visualize datasets, and conduct network analysis.

Assignments and Help Sessions

Each weekly meeting will be accompanied by an openly collaborative, take-home programming assignment which will be due before the following class. After each class, a help session and tutoring hours will be provided for those requiring extra guidance on the assignment. The programming assignments will often be cumulative and build on one another, so completing the functionality of each assignment is crucial.

Spring Schedule

Week 1 (4/4/2013) Operating Systems and Computer Science Basics
Introduction to basic concepts of operating systems and programming languages. Introduction to interpreted programming languages Python and R, with comparison to SPSS/Stata. Introduction to command-line interfaces.
Assignment 1 is available here
Week 2 (4/11/2013)Data types and data structures

Lists, arrays, dictionaries. Vectors and matrices. Operations on these data types in Python and R. Introduction to the scipy data structure library for Python.

Readings to be done before Week 2 meeting:

Week 3 (4/18/2013)Structured programming and code management
Conditional statements, loops, functions, modules. Using a text editor and organizing code in separate files. Introduction to programming styles—functional, imperative, object-oriented.

Assignment 2

Week 4 (Friday, 4/26/2013)Input and Output NOTE: changed meeting date
Storing analyzed data for later reuse (Python's pickle and cPickle module).

Week 5 (5/2/2013 in Searle 240B)Web scraping and Content analysis NOTE: changed location
Week 6 (5/9/2013)Plotting, Graphing, Visualization
Visualizing descriptive statistics in R (histograms, box plots, scatter plots, etc.) Introduction to matplotlib for Python.
Week 7 (5/16/2013)Data management and Databases
Introduction to relational databases, which provide stable and network-accessible storage of medium-to-large (but not very large) datasets. Comparison of spreadsheet interfaces (Excel, SPSS) with relational databases like MySQL. Introduction to MySQLdb library for Python.

SQL books available on Proquest
  • Beaulieu, Alan. Learning SQL, Second Edition, 2009. There's also a copy of this in SS402 if you want to check it out in person.
  • Kline, Kevin. SQL in a Nutshell, 2008. This is mostly just a command reference, which can be useful when you want to understand the many parameters to e.g., ALTER TABLE. The book tries to distinguish between variant syntax for Microsoft SQL (which we hope you won't ever have to use) and MySQL.
  • Molinaro, Anthony. SQL Cookbook, 2005. This book ranges from simple questions "How do I insert a new row" to the more complex, and is a good companion to a tutorial like Learning SQL.

Week 8 (5/23/2013)Data Analysis & Data Mining
Building structured, multi-stage programs that systematically evaluate hypotheses and explore patterns in data.
Week 9 (5/30/2013)Network Analysis
Basics of network representation, manipulation and visualization in Python's networkx library and R's sna, network, igraph.
Week 10 (6/6/2013)TBA

Software

Readings

Selected readings will be assigned from the following texts (some available in their entirety online):

Additional Texts

Lecture Slides

Local Meetups

UPDATED for 2013!

One easy, ethnographically-oriented way to help adapt to contemporary computing culture is to crash one of Chicago's many friendly user groups and meetups, which typically offer free pizza, beer, and PowerPoint presentations in various downtown office spaces and taverns.


1 Students will be required to install Python and R distributions on their own computers.