I have a data.table
as following:-
datazzz <- data.table(group = c(rep("a", times = 3),
rep("b", times = 4),
rep("c", times = 4),
rep("k", times = 2),
rep("f", times = 4)),
year = c(2017:2019, 2016:2019, 2016:2019, 2018, 2019,
2017:2020),
values = runif(17))
datazzz
group year values
1: a 2017 0.14710475
2: a 2018 0.23493958
3: a 2019 0.97570157
4: b 2016 0.82078366
5: b 2017 0.92685531
6: b 2018 0.64406726
7: b 2019 0.17611851
8: c 2016 0.96894329
9: c 2017 0.97501190
10: c 2018 0.49732578
11: c 2019 0.90125133
12: k 2018 0.14836372
13: k 2019 0.01368339
14: f 2017 0.84735620
15: f 2018 0.71688780
16: f 2019 0.62894310
17: f 2020 0.73526859
I want to select only those groups who have the year
s from 2016 till 2019. And hence, my resulting data.table
would look like
group year values
1: b 2016 0.82078366
2: b 2017 0.92685531
3: b 2018 0.64406726
4: b 2019 0.17611851
5: c 2016 0.96894329
6: c 2017 0.97501190
7: c 2018 0.49732578
8: c 2019 0.90125133
CodePudding user response:
The subsetting condition is that all years are present in the group. We can construct a variable V1
with that condition and select rows based on that by passing row indices through .I
.
datazzz[datazzz[, .I[all(2016:2019 %in% unique(year))], by = .(group)]$V1]
group year values
1: b 2016 0.86527048
2: b 2017 0.46478348
3: b 2018 0.94761731
4: b 2019 0.05005278
5: c 2016 0.73977484
6: c 2017 0.23698556
7: c 2018 0.29560906
8: c 2019 0.61450736
CodePudding user response:
We could do:
library(data.table)
setDT(datazzz)[, if(min(year) == 2016 & max(year)==2019) .SD, by = group]
group year values
1: b 2016 0.2321175
2: b 2017 0.2776979
3: b 2018 0.5695105
4: b 2019 0.7224908
5: c 2016 0.1904413
6: c 2017 0.4608467
7: c 2018 0.8258316
8: c 2019 0.7198854