I have two data frames, namely a
and b
:
a
---------------
id age group age_diff
1 10 x 2
1 11 y 1
2 4 x NA
2 5 y NA
3 9 x NA
3 11 y NA
---------------
and
b:
--------------
id group base_age
1 x 12
1 y 12
2 x 15
2 y 13
3 x 20
3 y 22
I need to fill NA in the data frame a
with the difference between the age
in a
and base_age
in b
of the corresponding id
and group
. For example, for id=2, group=x
in a
, age_diff
needs to be 15-4 = 11.
Desired Result:
a
---------------
id age group age_diff
1 10 x 2
1 11 y 1
2 4 x 11
2 5 y 8
3 9 x 11
3 11 y 11
---------------
I have
a <- a %>% mutate(age_diff = ifelse(is.na(age_diff), --- , age_diff))
but I'm not sure what to put in the ---. Any inputs would be very appreciated.
CodePudding user response:
new
coloumn is the result.
df1 %>% mutate(new = if_else(is.na(df1$age_diff), (df2$base_age - df1$age), df1$age_diff))
id age group age_diff new
1 1 10 x 2 2
2 1 11 y 1 1
3 2 4 x NA 11
4 2 5 y NA 8
5 3 9 x NA 11
6 3 11 y NA 11
Data:
df1 = structure(list(id = c(1, 1, 2, 2, 3, 3), age = c(10, 11, 4, 5,
9, 11), group = c("x", "y", "x", "y", "x", "y"), age_diff = c(2,
1, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
df2 = structure(list(id = c(1, 1, 2, 2, 3, 3), group = c("x", "y",
"x", "y", "x", "y"), base_age = c(12, 12, 15, 13, 20, 22)), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
Does this work? The difference will hold good regardless of na
in dataframe a.
library(dplyr)
a %>% inner_join(b, by = c('id','group')) %>%
mutate(age_diff = base_age - age) %>% select(-base_age)
id age group age_diff
1 1 10 x 2
2 1 11 y 1
3 2 4 x 11
4 2 5 y 8
5 3 9 x 11
6 3 11 y 11
Data used:
a
id age group age_diff
1 1 10 x 2
2 1 11 y 1
3 2 4 x NA
4 2 5 y NA
5 3 9 x NA
6 3 11 y NA
b
id group base_age
1 1 x 12
2 1 y 12
3 2 x 15
4 2 y 13
5 3 x 20
6 3 y 22