I'm trying to use a separate dataframe to match values to a focal dataframe. I can easily do this with a for
loop but trying to do it all in a dplyr::case_when()
to be consistent with the rest of my code.
Here are the two dataframes I'm using:
df_1 = data.frame(
year = rep(c(2001:2020), 5),
area = c(rep("8", 20), rep("9", 20), rep("10", 20), rep("11", 20), rep("12", 20))
)
df_2 = data.frame(
val = rnorm(20, mean = 10, sd = 0.5),
year = rep(c(2001:2020))
)
I want to make a new column in df_1
where for all areas and years less than 2002, the value is zero, for all years in areas != 12, the value is zero, and for >2002 in area 12, the value comes from df_2
.
Here's my current solution:
# match up the years
df_1_some_matched = df_1 %>%
dplyr::mutate(
new = ifelse(
area == "12" & year < 2002,
0,
ifelse(
area != 12,
0,
NA
)
)
)
# now to go through the rest:
for(yr in 2002:2020) {
# get the subset
df_1[which(df_1$area == "12" & df_1$year == yr), "new"] =
# find the value from the other dataset
df_2[which(df_2$year == yr, "val"]
}
My attempt:
## attempt in case_when()
df_1_case_matched = df_1 %>%
dplyr::mutate(
new = dplyr::case_when(
area == "12" & year < 2002 ~ 0,
area != 12 ~ 0,
#### THIS IS THE LINE IM MISSING
area == 12 & year == ????????????????
)
)
CodePudding user response:
This can actually be accomplished by merging a dplyr::case_when()
with a dplyr::left_join()
and then rbind()
ing the result. However, there might be a better more concise dplyr
option, so I'd love to hear others' thoughts
rbind(
df_1 %>%
dplyr::filter((area != "12") | (area == "12" & year < 2002)) %>%
dplyr::mutate(
dplyr::case_when(
area == "12" & year < 2002 ~ 0,
area != 12 ~ 0
)
),
dplyr::left_join(
df_1 %>%
filter(area == "12" & year > 2002),
df_2,
by = c("year")
)
)
CodePudding user response:
I'd go with something like this:
df_1 %>% left_join(df_2, by='year') %>%
mutate(new = if_else(area == 12 & year > 2002,val, 0)) %>% select(-val)
CodePudding user response:
Easy
df_1 %>% left_join(df_2, by='year') %>% mutate(new = if_else(area == 12 & year > 2002,val, 0)) %>% select("your val")