Home > OS >  Count unique values before a row itself for each group in data.table
Count unique values before a row itself for each group in data.table

Time:06-22

I have a data.table like this;

df <- data.table(Date = c(seq.Date(from = as.Date('2022-01-01'),to = as.Date('2022-01-07'),by=1),
                          seq.Date(from = as.Date('2022-01-01'),to = as.Date('2022-01-07'),by=1)),
           Product = c(rep('A',7),rep('B',7)),
           Owner = c(c('X','X','Y','Y','Z','Z','Z'),c('M','M','M','M','N','O','O')))

Product is my group value here, I'd like to create a column which shows the owner of product before the current row.

I mean;

   Date       Product Owner BeforeOwnerCount
   <date>     <chr>   <chr>            <dbl>
 1 2022-01-01 A       X                    0
 2 2022-01-02 A       X                    0
 3 2022-01-03 A       Y                    1
 4 2022-01-04 A       Y                    1
 5 2022-01-05 A       Z                    2
 6 2022-01-06 A       Z                    2
 7 2022-01-07 A       Z                    2
 8 2022-01-01 B       M                    0
 9 2022-01-02 B       M                    0
10 2022-01-03 B       M                    0
11 2022-01-04 B       M                    0
12 2022-01-05 B       N                    1
13 2022-01-06 B       O                    2
14 2022-01-07 B       O                    2

dplyr verbs are also welcome.

Thanks in advance.

CodePudding user response:

Assuming the Date column is in chonological order.. (if not, key by Date first)

df[, BOC := rleid(Owner) - 1, by = Product]

          Date Product Owner BOC
 1: 2022-01-01       A     X   0
 2: 2022-01-02       A     X   0
 3: 2022-01-03       A     Y   1
 4: 2022-01-04       A     Y   1
 5: 2022-01-05       A     Z   2
 6: 2022-01-06       A     Z   2
 7: 2022-01-07       A     Z   2
 8: 2022-01-01       B     M   0
 9: 2022-01-02       B     M   0
10: 2022-01-03       B     M   0
11: 2022-01-04       B     M   0
12: 2022-01-05       B     N   1
13: 2022-01-06       B     O   2
14: 2022-01-07       B     O   2

CodePudding user response:

Using dplyr with factor:

library(dplyr)
library(data.table)
setDF(df) %>%
  group_by(Product) %>%
  mutate(BeforeOwnerCount = as.numeric(as.factor(Owner))-1)

Output:

# A tibble: 14 × 4
# Groups:   Product [2]
   Date       Product Owner BeforeOwnerCount
   <date>     <chr>   <chr>            <dbl>
 1 2022-01-01 A       X                    0
 2 2022-01-02 A       X                    0
 3 2022-01-03 A       Y                    1
 4 2022-01-04 A       Y                    1
 5 2022-01-05 A       Z                    2
 6 2022-01-06 A       Z                    2
 7 2022-01-07 A       Z                    2
 8 2022-01-01 B       M                    0
 9 2022-01-02 B       M                    0
10 2022-01-03 B       M                    0
11 2022-01-04 B       M                    0
12 2022-01-05 B       N                    1
13 2022-01-06 B       O                    2
14 2022-01-07 B       O                    2
  • Related