Home > Enterprise >  Count rows by grouping and reset counter on new group with datatable and R
Count rows by grouping and reset counter on new group with datatable and R

Time:11-25

I need to count the number of rows per data pool that a datatable gets. These require that the week column "week" and "Exist" must be filled in. I have the following data:

week_2020 <- seq(202001, 202015, 1)
week_2021 <- seq(202101, 202110, 1)

Exist <- c("TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE",
           NA,NA,NA,
           "TRUE","TRUE",NA,NA,"TRUE","TRUE","TRUE","TRUE",
           NA,NA,NA,
           "TRUE",NA, NA)

Year <- c(rep(2020,15),rep(2021,10) )

df<-data.table(Store = "store_1", Client = "client_1", 
               Year = Year,
               week = c(week_2020, week_2021),
               Exist = Exist)
Store Client Year Week Exist
Store 1 Client 1 2020 202001 TRUE
Store 1 Client 1 2020 202002 TRUE
Store 1 Client 1 2020 202003 TRUE
Store 1 Client 1 2020 202004 TRUE
Store 1 Client 1 2020 202005 TRUE
Store 1 Client 1 2020 202006 TRUE
Store 1 Client 1 2020 202007 TRUE
Store 1 Client 1 2020 202008 TRUE
Store 1 Client 1 2020 202009 NA
Store 1 Client 1 2020 202010 NA
Store 1 Client 1 2020 202011 NA
Store 1 Client 1 2020 202012 TRUE
Store 1 Client 1 2020 202013 TRUE
Store 1 Client 1 2020 202014 NA
Store 1 Client 1 2020 202015 NA
Store 1 Client 1 2021 202101 TRUE
Store 1 Client 1 2021 202102 TRUE
Store 1 Client 1 2021 202103 TRUE
Store 1 Client 1 2021 202104 TRUE
Store 1 Client 1 2021 202105 NA
Store 1 Client 1 2021 202106 NA
Store 1 Client 1 2021 202107 NA
Store 1 Client 1 2021 202108 TRUE
Store 1 Client 1 2021 202109 NA
Store 1 Client 1 2021 202110 NA

As shown in the table, some data in the "Exist" column is NA, which means that it does not exist, but that grouping should be counted as well.

I have created a variable that helps me to count the weeks until I find the lost weeks, and then I should count them and reset the counter, and another that does the max count "n_week_Count" but I can't do what I need. I hope you can help me to solve this problem. Thanks in advice

This is what i have...

df[, ':=' (n_weekCount = 1:.SD[,(.N)] ), keyby = c("Store", "Client", "Year", "Exist")
   ][, ':=' (MaxweekCount = .SD[, max(n_weekCount)]), keyby = c("Store", "Client", "Year", "Exist")
][order(week)]
Store Client Year Week Exist n_weekCount maxWeek_Count
Store 1 Client 1 2020 202001 TRUE 1 10
Store 1 Client 1 2020 202002 TRUE 2 10
Store 1 Client 1 2020 202003 TRUE 3 10
Store 1 Client 1 2020 202004 TRUE 4 10
Store 1 Client 1 2020 202005 TRUE 5 10
Store 1 Client 1 2020 202006 TRUE 6 10
Store 1 Client 1 2020 202007 TRUE 7 10
Store 1 Client 1 2020 202008 TRUE 8 10
Store 1 Client 1 2020 202009 NA 1 5
Store 1 Client 1 2020 202010 NA 2 5
Store 1 Client 1 2020 202011 NA 3 5
Store 1 Client 1 2020 202012 TRUE 9 10
Store 1 Client 1 2020 202013 TRUE 10 10
Store 1 Client 1 2020 202014 NA 4 5
Store 1 Client 1 2020 202015 NA 5 5
Store 1 Client 1 2021 202101 TRUE 1 10
Store 1 Client 1 2021 202102 TRUE 2 10
Store 1 Client 1 2021 202103 TRUE 3 10
Store 1 Client 1 2021 202104 TRUE 4 10
Store 1 Client 1 2021 202105 NA 1 5
Store 1 Client 1 2021 202106 NA 2 5
Store 1 Client 1 2021 202107 NA 3 5
Store 1 Client 1 2021 202108 TRUE 1 10
Store 1 Client 1 2021 202109 NA 4 5
Store 1 Client 1 2021 202110 NA 5 5

The desired result would be:

Store Client Year Week Exist n_weekCount maxWeek_Count
Store 1 Client 1 2020 202001 TRUE 1 8
Store 1 Client 1 2020 202002 TRUE 2 8
Store 1 Client 1 2020 202003 TRUE 3 8
Store 1 Client 1 2020 202004 TRUE 4 8
Store 1 Client 1 2020 202005 TRUE 5 8
Store 1 Client 1 2020 202006 TRUE 6 8
Store 1 Client 1 2020 202007 TRUE 7 8
Store 1 Client 1 2020 202008 TRUE 8 8
Store 1 Client 1 2020 202009 NA 1 3
Store 1 Client 1 2020 202010 NA 2 3
Store 1 Client 1 2020 202011 NA 3 3
Store 1 Client 1 2020 202012 TRUE 1 2
Store 1 Client 1 2020 202013 TRUE 2 2
Store 1 Client 1 2020 202014 NA 1 2
Store 1 Client 1 2020 202015 NA 2 2
Store 1 Client 1 2021 202101 TRUE 1 4
Store 1 Client 1 2021 202102 TRUE 2 4
Store 1 Client 1 2021 202103 TRUE 3 4
Store 1 Client 1 2021 202104 TRUE 4 4
Store 1 Client 1 2021 202105 NA 1 3
Store 1 Client 1 2021 202106 NA 2 3
Store 1 Client 1 2021 202107 NA 3 3
Store 1 Client 1 2021 202108 TRUE 1 1
Store 1 Client 1 2021 202109 NA 1 2
Store 1 Client 1 2021 202110 NA 2 2

CodePudding user response:

We may use rleid for grouping and create the columns (:=) with seq_len(.N) and group size (.N)

library(data.table)
df[, c("n_WeekCount", "maxWeek_Count") := .(seq_len(.N), .N),
      .(grp = rleid(Exist), Store, Client, Year)]

-output

> df
      Store   Client  Year   week  Exist n_WeekCount maxWeek_Count
     <char>   <char> <num>  <num> <char>       <int>         <int>
 1: store_1 client_1  2020 202001   TRUE           1             8
 2: store_1 client_1  2020 202002   TRUE           2             8
 3: store_1 client_1  2020 202003   TRUE           3             8
 4: store_1 client_1  2020 202004   TRUE           4             8
 5: store_1 client_1  2020 202005   TRUE           5             8
 6: store_1 client_1  2020 202006   TRUE           6             8
 7: store_1 client_1  2020 202007   TRUE           7             8
 8: store_1 client_1  2020 202008   TRUE           8             8
 9: store_1 client_1  2020 202009   <NA>           1             3
10: store_1 client_1  2020 202010   <NA>           2             3
11: store_1 client_1  2020 202011   <NA>           3             3
12: store_1 client_1  2020 202012   TRUE           1             2
13: store_1 client_1  2020 202013   TRUE           2             2
14: store_1 client_1  2020 202014   <NA>           1             2
15: store_1 client_1  2020 202015   <NA>           2             2
16: store_1 client_1  2021 202101   TRUE           1             4
17: store_1 client_1  2021 202102   TRUE           2             4
18: store_1 client_1  2021 202103   TRUE           3             4
19: store_1 client_1  2021 202104   TRUE           4             4
20: store_1 client_1  2021 202105   <NA>           1             3
21: store_1 client_1  2021 202106   <NA>           2             3
22: store_1 client_1  2021 202107   <NA>           3             3
23: store_1 client_1  2021 202108   TRUE           1             1
24: store_1 client_1  2021 202109   <NA>           1             2
25: store_1 client_1  2021 202110   <NA>           2             2
  • Related