Home > Mobile >  R subset a data.table only criteria is met, by group
R subset a data.table only criteria is met, by group

Time:01-17

I would like to subset a data.table where a certain criteria is met, but keep another subset of the data if the criteria is not met, by group.

E.g. in the following example data, I would like to subset every ID group where year = 2020, but if 2020 does not exist in that ID group, just keep the data where year = 0 only

libary(data.table)

# dummy data
dt <- data.table(ID = c(rep("A",12), rep("B",12), rep("C",6)), 
year = c(rep(2020,6), rep(0,6), rep(2019,6), rep(0,12)), hour = c(rep(1:6,5)), N = sample(1:20,30, replace = T))

# desired result: 

 ID year hour  N
 1:  A 2020    1 12
 2:  A 2020    2 18
 3:  A 2020    3  2
 4:  A 2020    4  7
 5:  A 2020    5 18
 6:  A 2020    6  8
 7:  B    0    1 18
 8:  B    0    2 17
 9:  B    0    3  2
10:  B    0    4  3
11:  B    0    5  9
12:  B    0    6  5
13:  C    0    1 17
14:  C    0    2 19
15:  C    0    3  5
16:  C    0    4 11
17:  C    0    5 20
18:  C    0    6 15

I've only done this by subsetting twice and binding, e.g. ;

dt_res1 <- dt[year == 2020]

dt_res2 <- dt[!(ID %in% dt_res1[,ID])][year == 0]

rbindlist(list(dt_res1, dt_res2), use.names = T)

CodePudding user response:

This?

dt[, .SD[year == 2020 | (!2020 %in% year & year == 0),], by = ID]
#         ID  year  hour     N
#     <char> <num> <int> <int>
#  1:      A  2020     1    10
#  2:      A  2020     2     5
#  3:      A  2020     3     7
#  4:      A  2020     4    15
#  5:      A  2020     5     1
#  6:      A  2020     6    12
#  7:      B     0     1    10
#  8:      B     0     2     9
#  9:      B     0     3     3
# 10:      B     0     4    19
# 11:      B     0     5    15
# 12:      B     0     6     6
# 13:      C     0     1     9
# 14:      C     0     2     9
# 15:      C     0     3     3
# 16:      C     0     4    12
# 17:      C     0     5     2
# 18:      C     0     6    18
  • Related