Home > OS >  Creating new variables with links between cases via id's (longitudinal data)
Creating new variables with links between cases via id's (longitudinal data)

Time:11-18

I have a dataframe with two persons in three points of time (3x "id" == 1 and 3x "id" == 2):

id <- c(1, 1, 1, 2, 2, 2)
id2 <- c(NA, NA, NA, 1, 1, 1)
x <- c(4, 5, 5, 1, 1, 1)
dat1 <- data.frame(id, id2, x)
dat1

  id id2 x
1  1  NA 4
2  1  NA 5
3  1  NA 5
4  2   1 1
5  2   1 1
6  2   1 1

Now i want to create a new variable "y" with following rule: If "id2" is not NA, "y" should be the value of "x" that occurs most often for the person with "id2" == "id". In this example data: For all points in time, the person with "id" == 2 gets a 5 in "y", because person 2 has a 1 in "id2" and 5 is the number that occurs most often for the person with "id" == 1. Since "id2" is NA for person 1, "y" will be NA aswell (there is no other person to refer to for person 1). Result is:

  id id2 x y
1  1  NA 4 NA
2  1  NA 5 NA
3  1  NA 5 NA
4  2   1 1 5
5  2   1 1 5
6  2   1 1 5

Is there a way to do this with dplyr?

CodePudding user response:

We may find the Mode grouped by 'id', then match the 'id2' with 'id' and replace with the 'Mode' values

library(dplyr)
dat1 %>% 
    group_by(id) %>%
    mutate(tmp = Mode(x)) %>% 
    ungroup %>%
    mutate(y= tmp[match(id2, id)], tmp = NULL)

where

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
  • Related