Creating new variables with links between cases via id's (longitudinal data)-CodePudding

I have a dataframe with two persons in three points of time (3x "id" == 1 and 3x "id" == 2):

id <- c(1, 1, 1, 2, 2, 2)
id2 <- c(NA, NA, NA, 1, 1, 1)
x <- c(4, 5, 5, 1, 1, 1)
dat1 <- data.frame(id, id2, x)
dat1

  id id2 x
1  1  NA 4
2  1  NA 5
3  1  NA 5
4  2   1 1
5  2   1 1
6  2   1 1

Now i want to create a new variable "y" with following rule: If "id2" is not NA, "y" should be the value of "x" that occurs most often for the person with "id2" == "id". In this example data: For all points in time, the person with "id" == 2 gets a 5 in "y", because person 2 has a 1 in "id2" and 5 is the number that occurs most often for the person with "id" == 1. Since "id2" is NA for person 1, "y" will be NA aswell (there is no other person to refer to for person 1). Result is:

  id id2 x y
1  1  NA 4 NA
2  1  NA 5 NA
3  1  NA 5 NA
4  2   1 1 5
5  2   1 1 5
6  2   1 1 5

Is there a way to do this with dplyr?

CodePudding user response：

We may find the Mode grouped by 'id', then match the 'id2' with 'id' and replace with the 'Mode' values

library(dplyr)
dat1 %>% 
    group_by(id) %>%
    mutate(tmp = Mode(x)) %>% 
    ungroup %>%
    mutate(y= tmp[match(id2, id)], tmp = NULL)

where

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}