I would like to subset a data.table
where a certain criteria is met, but keep another subset of the data if the criteria is not met, by group.
E.g. in the following example data, I would like to subset every ID group where year = 2020, but if 2020 does not exist in that ID group, just keep the data where year = 0 only
libary(data.table)
# dummy data
dt <- data.table(ID = c(rep("A",12), rep("B",12), rep("C",6)),
year = c(rep(2020,6), rep(0,6), rep(2019,6), rep(0,12)), hour = c(rep(1:6,5)), N = sample(1:20,30, replace = T))
# desired result:
ID year hour N
1: A 2020 1 12
2: A 2020 2 18
3: A 2020 3 2
4: A 2020 4 7
5: A 2020 5 18
6: A 2020 6 8
7: B 0 1 18
8: B 0 2 17
9: B 0 3 2
10: B 0 4 3
11: B 0 5 9
12: B 0 6 5
13: C 0 1 17
14: C 0 2 19
15: C 0 3 5
16: C 0 4 11
17: C 0 5 20
18: C 0 6 15
I've only done this by subsetting twice and binding, e.g. ;
dt_res1 <- dt[year == 2020]
dt_res2 <- dt[!(ID %in% dt_res1[,ID])][year == 0]
rbindlist(list(dt_res1, dt_res2), use.names = T)
CodePudding user response:
This?
dt[, .SD[year == 2020 | (!2020 %in% year & year == 0),], by = ID]
# ID year hour N
# <char> <num> <int> <int>
# 1: A 2020 1 10
# 2: A 2020 2 5
# 3: A 2020 3 7
# 4: A 2020 4 15
# 5: A 2020 5 1
# 6: A 2020 6 12
# 7: B 0 1 10
# 8: B 0 2 9
# 9: B 0 3 3
# 10: B 0 4 19
# 11: B 0 5 15
# 12: B 0 6 6
# 13: C 0 1 9
# 14: C 0 2 9
# 15: C 0 3 3
# 16: C 0 4 12
# 17: C 0 5 2
# 18: C 0 6 18