Home > database >  dplyr function on grouped data to get the value of a variable when a different variable is equal to
dplyr function on grouped data to get the value of a variable when a different variable is equal to


I have the following dataset:

id <- c(1,1,1,2,2,2)
year <- c(2012, 2013, 2014, 2012, 2013, 2014)
assignment <- c(0,1,1,0,0,0)
df <- cbind.data.frame(id, year, assignment)

I want to mutate a new variable which takes the value of whatever the assignment variable was equal to in the minimum year value per group. For example, for ID 1, the value of assignment in the smallest year (2012) is 0, so the new variable will be zero in all instances of ID 1.

CodePudding user response:

You could group_by id, arrange by year and get the first value of assignment:

library(dplyr, warn=FALSE)

df %>% 
  group_by(id) %>% 
  arrange(year) %>% 
  mutate(new_assignment = first(assignment)) %>% 
#> # A tibble: 6 × 4
#>      id  year assignment new_assignment
#>   <dbl> <dbl>      <dbl>          <dbl>
#> 1     1  2012          0              0
#> 2     2  2012          0              0
#> 3     1  2013          1              0
#> 4     2  2013          0              0
#> 5     1  2014          1              0
#> 6     2  2014          0              0

CodePudding user response:

Does this work for you?

df %>% 
  arrange(id,year) %>%  #sorts tibble by id and year
  group_by(id) %>%  
  slice_head(n=1) %>%  #takes the first row, by id, which necessarily contains the min year
  select(id,assignment) %>% # keeps only the id and the desired assignment
  left_join(df, by="id") %>%  # puts everything together
  rename("assignment_new" = "assignment.x",
         "assignment" = "assignment.y")
  • Related