Home > Software design >  Using indexing to perform mathematical operations on data frame in r
Using indexing to perform mathematical operations on data frame in r

Time:11-09

I'm struggling to perform basic indexing on a data frame to perform mathematical operations. I have a data frame containing all 50 US states with an entry for each month of the year, so there are 600 observations. I wish to find the difference between a value for the month of December minus the January value for each of the states. My data looks like this:

> head(df)
  state year month             value
1    AL 2020    01               2.7
2    AK 2020    01                 5
3    AZ 2020    01               4.8
4    AR 2020    01               3.7
5    CA 2020    01               4.2
7    CO 2020    01               2.7

For instance, AL has a value in Dec of 4.7 and Jan value of 2.7 so I'd like to return 2 for that state.

I have been trying to do this with the group_by and summarize functions, but can't figure out the indexing piece of it to grab values that correspond to a condition. I couldn't find a resource for performing these mathematical operations using indexing on a data frame, and would appreciate assistance as I have other transformations I'll be using.

CodePudding user response:

With dplyr:

library(dplyr)
df %>%
  group_by(state) %>%
  summarize(year_change = value[month == "12"] - value[month == "01"])

This assumes that your data is as you describe--every state has a single value for every month. If you have missing rows, or multiple observations in for a state in a given month, I would not expect this code to work.

Another approach, based row order rather than month value, might look like this:

library(dplyr)
df %>%
  ## make sure things are in the right order
  arrange(state, month) %>% 
  group_by(state) %>%
  summarize(year_change = last(value) - first(value))
  •  Tags:  
  • r
  • Related