I'm struggling to perform basic indexing on a data frame to perform mathematical operations. I have a data frame containing all 50 US states with an entry for each month of the year, so there are 600 observations. I wish to find the difference between a value for the month of December minus the January value for each of the states. My data looks like this:
> head(df)
state year month value
1 AL 2020 01 2.7
2 AK 2020 01 5
3 AZ 2020 01 4.8
4 AR 2020 01 3.7
5 CA 2020 01 4.2
7 CO 2020 01 2.7
For instance, AL has a value in Dec of 4.7 and Jan value of 2.7 so I'd like to return 2 for that state.
I have been trying to do this with the group_by and summarize functions, but can't figure out the indexing piece of it to grab values that correspond to a condition. I couldn't find a resource for performing these mathematical operations using indexing on a data frame, and would appreciate assistance as I have other transformations I'll be using.
CodePudding user response:
With dplyr
:
library(dplyr)
df %>%
group_by(state) %>%
summarize(year_change = value[month == "12"] - value[month == "01"])
This assumes that your data is as you describe--every state has a single value for every month. If you have missing rows, or multiple observations in for a state in a given month, I would not expect this code to work.
Another approach, based row order rather than month value, might look like this:
library(dplyr)
df %>%
## make sure things are in the right order
arrange(state, month) %>%
group_by(state) %>%
summarize(year_change = last(value) - first(value))