Home > Back-end >  Drop those row groups which do not have specified year values
Drop those row groups which do not have specified year values

Time:03-08

I have the following table:-

data <- data.table(alpha = c(rep(letters[1:5], each=4), "f", "f", "g", "h", "h", rep("i", 4)),
     year = c(rep(2014:2017, 5), 2014, 2015, 2016, 2015, 2016, 2014:2017),
     values = runif(29))

data

 #    alpha year     values
 # 1:     a 2014 0.58602291
 # 2:     a 2015 0.84320915
 # 3:     a 2016 0.15504613
 # 4:     a 2017 0.81064753
 # 5:     b 2014 0.89886539
 # 6:     b 2015 0.81288998
 # 7:     b 2016 0.92019694
 # 8:     b 2017 0.64171626
 # 9:     c 2014 0.07585062
 #10:     c 2015 0.90201830
 #11:     c 2016 0.68783471
 #12:     c 2017 0.80405486
 #13:     d 2014 0.45197629
 #14:     d 2015 0.84657377
 #15:     d 2016 0.25700501
 #16:     d 2017 0.44786659
 #17:     e 2014 0.73255596
 #18:     e 2015 0.45027413
 #19:     e 2016 0.52385583
 #20:     e 2017 0.62541879
 #21:     f 2014 0.38690750
 #22:     f 2015 0.43807544
 #23:     g 2016 0.47070025
 #24:     h 2015 0.57167596
 #25:     h 2016 0.40591139
 #26:     i 2014 0.16081680
 #27:     i 2015 0.71446476
 #28:     i 2016 0.70543309
 #29:     i 2017 0.57190553

What I want to do is remove those alpha groups which do not have values for all the years, i.e., from 2014 till 2017.

The resulting data.table have only a, b, d, e and i group rows. Hence, it would look like the following data.table:-

 #    alpha year     values
 # 1:     a 2014 0.58602291
 # 2:     a 2015 0.84320915
 # 3:     a 2016 0.15504613
 # 4:     a 2017 0.81064753
 # 5:     b 2014 0.89886539
 # 6:     b 2015 0.81288998
 # 7:     b 2016 0.92019694
 # 8:     b 2017 0.64171626
 # 9:     c 2014 0.07585062
 #10:     c 2015 0.90201830
 #11:     c 2016 0.68783471
 #12:     c 2017 0.80405486
 #13:     d 2014 0.45197629
 #14:     d 2015 0.84657377
 #15:     d 2016 0.25700501
 #16:     d 2017 0.44786659
 #17:     e 2014 0.73255596
 #18:     e 2015 0.45027413
 #19:     e 2016 0.52385583
 #20:     e 2017 0.62541879
 #21:     i 2014 0.16081680
 #22:     i 2015 0.71446476
 #23:     i 2016 0.70543309
 #24:     i 2017 0.57190553

CodePudding user response:

We may use all with %in% grouped by 'alpha', and extract the sequences (.I) to subset

library(data.table)
data[data[, .I[all(2014:2017 %in% year)], by = alpha]$V1]

-output

  alpha  year     values
    <char> <num>      <num>
 1:      a  2014 0.07179522
 2:      a  2015 0.75057740
 3:      a  2016 0.93851481
 4:      a  2017 0.46446911
 5:      b  2014 0.09386143
 6:      b  2015 0.83745829
 7:      b  2016 0.45925816
 8:      b  2017 0.79701217
 9:      c  2014 0.52311675
10:      c  2015 0.68618138
11:      c  2016 0.98856152
12:      c  2017 0.54151560
13:      d  2014 0.10919439
14:      d  2015 0.86810873
15:      d  2016 0.07182192
16:      d  2017 0.38575802
17:      e  2014 0.88276318
18:      e  2015 0.01134039
19:      e  2016 0.99038433
20:      e  2017 0.59897147
21:      i  2014 0.07223494
22:      i  2015 0.79560440
23:      i  2016 0.05423041
24:      i  2017 0.24912163
  • Related