Home > Software design >  Is it possible to make groups based on an ID of a person in R?
Is it possible to make groups based on an ID of a person in R?

Time:09-28

I have this data:

data <- data.frame(id_pers=c(4102,13102,27101,27102,28101,28102, 42101,42102,56102,73102,74103,103104,117103,117104,117105),
                   birthyear=c(1992,1994,1993,1992,1995,1999,2000,2001,2000, 1994, 1999, 1978, 1986, 1998, 1999))

I want to group the different persons by familys in a new column, so that persons 27101,27102 (siblings) are group/family 1 and 42101,42102 are in group 2, 117103,117104,117105 are in group 3 so on. Person "4102" has no siblings and should be a NA in the new column. It is always the case that 2 or more persons are siblings if the ID's are not further apart than a maximum of 6 numbers. I have a far larger dataset with over 3000 rows. How could I do it the most efficient way?

CodePudding user response:

You can use round with digits = -1:

data %>% 
  group_by(fam_id = round(id_pers - 5, digits = -1))

output

# A tibble: 15 × 3
# Groups:   fam_id [10]
   id_pers birthyear fam_id
     <dbl>     <dbl>  <dbl>
 1    4102      1992   4100
 2   13102      1994  13100
 3   27101      1993  27100
 4   27102      1992  27100
 5   28101      1995  28100
 6   28106      1999  28100
 7   42101      2000  42100
 8   42102      2001  42100
 9   56102      2000  56100
10   73102      1994  73100
11   74103      1999  74100
12  103104      1978 103100
13  117103      1986 117100
14  117104      1998 117100
15  117105      1999 117100

CodePudding user response:

It looks like we can the 1000s digit (and above) to delineate groups.

library(dplyr)
data %>%
  mutate(
    famgroup = trunc(id_pers/1000),
    famgroup = match(famgroup, unique(famgroup))
  )
#    id_pers birthyear famgroup
# 1     4102      1992        1
# 2    13102      1994        2
# 3    27101      1993        3
# 4    27102      1992        3
# 5    28101      1995        4
# 6    28102      1999        4
# 7    42101      2000        5
# 8    42102      2001        5
# 9    56102      2000        6
# 10   73102      1994        7
# 11   74103      1999        8
# 12  103104      1978        9
# 13  117103      1986       10
# 14  117104      1998       10
# 15  117105      1999       10
  • Related