I have a data.table in long format:
dt <- data.table(id=1:3, Q1=c(1,2,1), Q2=c(3,1,3), Q3=c("a","b","a"), Q4=c("d","c","d"))
dt
id Q1 Q2 Q3 Q4
1: 1 1 3 a d
2: 2 2 1 b c
3: 3 1 3 a d
If I do
unique(dt[,Q1:Q4])
Q1 Q2 Q3 Q4
1: 1 3 a d
2: 2 1 b c
I get unique rows, but I lose the id's information. I would want to keep these in one row only, like this:
id Q1 Q2 Q3 Q4
1: 1,3 1 3 a d
2: 2 2 1 b c
How could I achieve this?
Thank you
CodePudding user response:
dt[, .(id = paste0(id, collapse = ";")), by = .(Q1, Q2, Q3, Q4)]
# Q1 Q2 Q3 Q4 id
# 1: 1 3 a d 1;3
# 2: 2 1 b c 2
CodePudding user response:
I suggest a list-column for this, in case you ever want to have distinct id
s again. This is not very different from Wimpel's answer, it's just replacing paste
with list
, but it's different enough of a concept to break out:
dt2 <- dt[, .(id = list(id)), by = .(Q1, Q2, Q3, Q4)]
dt2
# Q1 Q2 Q3 Q4 id
# <num> <num> <char> <char> <list>
# 1: 1 3 a d 1,3
# 2: 2 1 b c 2
With this, if you ever need to look for "membership", you can still use %in%
(though you'd need to lapply
or similar):
dt2[ sapply(id, `%in%`, x = 1), ]
# Q1 Q2 Q3 Q4 id
# <num> <num> <char> <char> <list>
# 1: 1 3 a d 1,3
Otherwise, if you use something like toString
or paste
, you'd need to work with regexes to try to find membership.