I have a data frame that lists individual mass shootings for each state between 1991-2020. I would like to 1) sum the total victims each year for each state, and 2) sum the total number of mass shootings each state had each year.
So far, I've only managed to get a total sum of victims between 1991-2020 for each state. And I'm not even sure how I could get a column with the total incidents per year, per state. Are there any adjustments I can make to the aggregate
function, or is there some other function to get the information I want?
What I have:
combined = read.csv('https://raw.githubusercontent.com/bandcar/massShootings/main/combo1991_2020_states.csv')
> head(combined)
state date year fatalities injured total_victims
3342 Alabama 04/07/2009 2009 4 0 4
3351 Alabama 03/10/2009 2009 10 6 16
3285 Alabama 01/29/2012 2012 5 0 5
135 Alabama 12/28/2013 2013 3 5 8
267 Alabama 07/06/2013 2013 0 4 4
557 Alabama 06/08/2014 2014 1 4 5
q = aggregate(total_victims ~ state,data=combined,FUN=sum)
> head(q)
state total_victims
1 Alabama 364
2 Alaska 19
3 Arizona 223
4 Arkansas 205
5 California 1816
6 Colorado 315
What I want for each state for each year:
year state total_victims total_shootings
1 2009 Alabama 20 2
2 2012 Alabama 5 1
3 2013 Alabama 12 2
4 2014 Alabama 5 1
CodePudding user response:
You can use group_by
in combination with summarise()
from the tidyverse packages.
library(tidyverse)
combined |>
group_by(state, year) |>
summarise(total_victims = sum(total_victims),
total_shootings = n())
This is the result you get:
# A tibble: 457 x 4
# Groups: state [52]
state year total_victims total_shootings
<chr> <int> <int> <int>
1 Alabama 2009 20 2
2 Alabama 2012 5 1
3 Alabama 2013 12 2
4 Alabama 2014 10 2
5 Alabama 2015 17 4