HW02: Exploring and visualizing data
Due by 9:30am (Chicago) on October 12th.
Now that you’ve demonstrated your software is setup, the goal of this assignment is to practice transforming and exploring data.
Go here to fork the repo.
Exploring clean data
The United States experiences far more mass shooting events than any other developed country in the world. While policymakers, politicians, the media, activists, and the general public recognize the widespread prevalence of these tragic events, policies intended to stop these events should be grounded in evidence and empirical data. Regrettably, mass shootings are not well-documented in the United States, and generalizable data is difficult to collect.
In July 2012, in the aftermath of a mass shooting in a movie theater in Aurora, Colorado, Mother Jones published a report on mass shootings in the United States since 1982. Importantly, they provided the underlying data set as an open-source database for anyone interested in studying and understanding this criminal behavior.
Obtain the data
I have included this dataset in the
rcfss library on GitHub. To install the package, use the command
devtools::install_github("uc-cfss/rcfss") in R. If you don’t already have the
devtools library installed, you will get an error. Go back and install this first using
install.packages(), then install
rcfss. The mass shootings dataset can be loaded using
data("mass_shootings").1 Use the help function in R (
?mass_shootings) to get detailed information on the variables and coding information.
Explore the data
Very specific prompts
- Generate a data frame that summarizes the number of mass shootings per year. Print the data frame as a formatted
- Generate a bar chart that identifies the number of mass shooters associated with each race category. The bars should be sorted from highest to lowest.
- Generate a boxplot visualizing the number of total victims, by type of location. Redraw the same plot, but remove the Las Vegas Strip massacre from the dataset.
More open-ended questions
Answer the following questions. Generate appropriate figures/tables to support your conclusions.
- How many white males with prior signs of mental illness initiated a mass shooting after 2000?
- Which month of the year has the most mass shootings? Generate a bar chart sorted in chronological order to provide evidence of your answer.
- How does the distribution of mass shooting fatalities differ between white and black shooters? What about white and latino shooters?
- Are mass shootings with shooters suffering from mental illness different from mass shootings with no signs of mental illness in the shooter? Assess the relationship between mental illness and total victims, mental illness and race, and the intersection of all three variables.
While you are practicing exploratory data analysis, your final graphs should be appropriate for sharing with outsiders. That means your graphs should have:
- A title
- Labels on the axes (see
When presenting tabular data (aka
dplyr::summarize()), make sure you format it correctly. Use the
kable() function from the
knitr package to format the table for the final document. For instance, this is a poorly presented table summarizing where gun deaths occurred:
## # A tibble: 6 × 2 ## location_type n ## <chr> <int> ## 1 Airport 1 ## 2 Military 5 ## 3 Other 47 ## 4 Religious 6 ## 5 School 17 ## 6 Workplace 38
kable() to format the table, add a caption, and label the columns:
Table: Table 1: Mass shootings in the United States (1982-2019), by location
|Location||Number of incidents|
?kable in the console to see how additional options.
Submit the assignment
Your assignment should be submitted as an R Markdown document using the
github_document format. Don’t know what an R Markdown document is? Read this! Or this! I have included starter files for you to modify to complete the assignment, so you are not beginning completely from scratch.
Follow instructions on homework workflow. As part of the pull request, you’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc.
Make sure to stage and commit:
mass-shootings_files/- this folder contains all the graphs you generated in your R Markdown document
Needs improvement: Displays minimal effort. Doesn’t complete all components. Code is poorly written and not documented. Uses the same type of plot for each graph, or doesn’t use plots appropriate for the variables being analyzed. No record of commits other than the final push to GitHub.
Satisfactory: Solid effort. Hits all the elements. No clear mistakes. Easy to follow (both the code and the output). Nothing spectacular, either bad or good.
Excellent: Finished all components of the assignment correctly. Code is well-documented (both self-documented and with additional comments as necessary). Graphs and tables are properly labeled. Uses multiple commits to back up and show a progression in the work. Analysis is clear and easy to follow, either because graphs are labeled clearly or you’ve written additional text to describe how you interpret the output.