Home > Software design >  get id's of multiple rows in one cell in r data.table
get id's of multiple rows in one cell in r data.table

Time:11-09

I have a data.table in long format:

dt <- data.table(id=1:3, Q1=c(1,2,1), Q2=c(3,1,3), Q3=c("a","b","a"), Q4=c("d","c","d"))
dt
   id Q1 Q2 Q3 Q4
1:  1  1  3  a  d
2:  2  2  1  b  c
3:  3  1  3  a  d

If I do

unique(dt[,Q1:Q4])
   Q1 Q2 Q3 Q4
1:  1  3  a  d
2:  2  1  b  c

I get unique rows, but I lose the id's information. I would want to keep these in one row only, like this:

    id Q1 Q2 Q3 Q4
1: 1,3  1  3  a  d
2:   2  2  1  b  c

How could I achieve this?

Thank you

CodePudding user response:

dt[, .(id = paste0(id, collapse = ";")), by = .(Q1, Q2, Q3, Q4)]
#    Q1 Q2 Q3 Q4  id
# 1:  1  3  a  d 1;3
# 2:  2  1  b  c   2

CodePudding user response:

I suggest a list-column for this, in case you ever want to have distinct ids again. This is not very different from Wimpel's answer, it's just replacing paste with list, but it's different enough of a concept to break out:

dt2 <- dt[, .(id = list(id)), by = .(Q1, Q2, Q3, Q4)]
dt2
#       Q1    Q2     Q3     Q4     id
#    <num> <num> <char> <char> <list>
# 1:     1     3      a      d    1,3
# 2:     2     1      b      c      2

With this, if you ever need to look for "membership", you can still use %in% (though you'd need to lapply or similar):

dt2[ sapply(id, `%in%`, x = 1), ]
#       Q1    Q2     Q3     Q4     id
#    <num> <num> <char> <char> <list>
# 1:     1     3      a      d    1,3

Otherwise, if you use something like toString or paste, you'd need to work with regexes to try to find membership.

  • Related