Learn to scrape, parse, analyze, and visualize data for exploratory analysis and quantitative research. No previous programming knowledge is assumed.
The workshop's goals include unshackling academic researchers from the constraints of commercial, general-purpose statistics/GIS software and to free them from the limitations of working with pre-existing and pre-formatted data sets. Students in the workshop will learn to write programs in the interpreted programming language Python and the (open-source) statistical software language/environment R, as well as learning to use databases and to interact with a wide variety of existing software. The simultaneous instruction of two very different principal programming languages is intentional: the workshop's secondary goals are to demonstrate that data can be created, analyzed and visualized by a diversity of methods, and to encourage students not to be intimidated by unfamiliar computer programming dialects and interfaces. The workshop will introduce methods required to parse text files, scrape data from other sources, write structured programs for statistical analysis, create and query databases, visualize datasets, and conduct network analysis.
|Week 1 (4/4/2013)||Operating Systems and Computer Science Basics|
Introduction to basic concepts of operating systems and programming
languages. Introduction to interpreted programming languages Python
and R, with comparison to SPSS/Stata. Introduction to command-line
Assignment 1 is available here
|Week 2 (4/11/2013)||Data types and data structures|
Lists, arrays, dictionaries. Vectors and matrices. Operations on these data types in Python and R. Introduction to the scipy data structure library for Python.
Readings to be done before Week 2 meeting:
|Week 3 (4/18/2013)||Structured programming and code management|
|Conditional statements, loops, functions, modules. Using a text editor and organizing code in separate files. Introduction to programming styles—functional, imperative, object-oriented.|
|Week 4 (Friday, 4/26/2013)||Input and Output NOTE: changed meeting date|
Storing analyzed data for later reuse (Python's pickle and
|Week 5 (5/2/2013 in Searle 240B)||Web scraping and Content analysis NOTE: changed location|
|Week 6 (5/9/2013)||Plotting, Graphing, Visualization|
|Visualizing descriptive statistics in R (histograms, box plots, scatter plots, etc.) Introduction to matplotlib for Python.|
|Week 7 (5/16/2013)||Data management and Databases|
Introduction to relational databases, which provide stable and
network-accessible storage of medium-to-large (but not very
large) datasets. Comparison of spreadsheet interfaces (Excel, SPSS) with
relational databases like MySQL. Introduction to MySQLdb
library for Python.
SQL books available on Proquest
|Week 8 (5/23/2013)||Data Analysis & Data Mining|
|Building structured, multi-stage programs that systematically evaluate hypotheses and explore patterns in data.|
|Week 9 (5/30/2013)||Network Analysis|
| Basics of network representation, manipulation and visualization in Python's networkx library and R's sna, network, igraph.|
|Week 10 (6/6/2013)||TBA|
1 Students will be required to install Python and R distributions on their own computers.