I have a dataframe with two persons in three points of time (3x "id" == 1 and 3x "id" == 2):
id <- c(1, 1, 1, 2, 2, 2)
id2 <- c(NA, NA, NA, 1, 1, 1)
x <- c(4, 5, 5, 1, 1, 1)
dat1 <- data.frame(id, id2, x)
dat1
id id2 x
1 1 NA 4
2 1 NA 5
3 1 NA 5
4 2 1 1
5 2 1 1
6 2 1 1
Now i want to create a new variable "y" with following rule: If "id2" is not NA, "y" should be the value of "x" that occurs most often for the person with "id2" == "id". In this example data: For all points in time, the person with "id" == 2 gets a 5 in "y", because person 2 has a 1 in "id2" and 5 is the number that occurs most often for the person with "id" == 1. Since "id2" is NA for person 1, "y" will be NA aswell (there is no other person to refer to for person 1). Result is:
id id2 x y
1 1 NA 4 NA
2 1 NA 5 NA
3 1 NA 5 NA
4 2 1 1 5
5 2 1 1 5
6 2 1 1 5
Is there a way to do this with dplyr?
CodePudding user response:
We may find the Mode
grouped by 'id', then match
the 'id2' with 'id' and replace with the 'Mode' values
library(dplyr)
dat1 %>%
group_by(id) %>%
mutate(tmp = Mode(x)) %>%
ungroup %>%
mutate(y= tmp[match(id2, id)], tmp = NULL)
where
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}