here is a sample of my data ;
df <- data.table(Date = c(as.Date('2021-02-09'),as.Date('2021-02-10'),as.Date('2021-02-12'),as.Date('2021-02-13'),
as.Date('2021-02-14'),as.Date('2021-02-05'),as.Date('2021-02-07'),as.Date('2021-02-08')),
Store = c('A','A','A','A','B','B','B','B'),
Product = c(1,1,1,1,2,2,2,2),
Quantity = c(3,4,5,6,7,8,9,10))
for each group (Store
and Product
columns represents a group) sometimes can have a break in the Date
column.
I want to keep them as missing in Quantity
value. Here is my desired output;
Date Store Product Quantity
<date> <chr> <dbl> <dbl>
2021-02-09 A 1 3
2021-02-10 A 1 4
2021-02-11 A 1 NA
2021-02-12 A 1 5
2021-02-13 A 1 6
2021-02-14 B 2 7
2021-02-05 B 2 8
2021-02-06 B 2 NA
2021-02-07 B 2 9
2021-02-08 A 2 10
note : I have a limited ram space and a huge table. So operations without reassigning (for example with :=
operator) would be way better.
Thanks in advance.
CodePudding user response:
A possible solution with J
and seq
.
Works for Store A, not sure I understood your question correctly for Store B as this creates more rows than your expected result due to Quantity registered on 2021-02-14
df[,.SD[J(Date=seq(min(Date),max(Date),by=1)),on=.(Date)],by=.(Store,Product)]
Store Product Date Quantity
<char> <num> <Date> <num>
1: A 1 2021-02-09 3
2: A 1 2021-02-10 4
3: A 1 2021-02-11 NA
4: A 1 2021-02-12 5
5: A 1 2021-02-13 6
6: B 2 2021-02-05 8
7: B 2 2021-02-06 NA
8: B 2 2021-02-07 9
9: B 2 2021-02-08 10
10: B 2 2021-02-09 NA
11: B 2 2021-02-10 NA
12: B 2 2021-02-11 NA
13: B 2 2021-02-12 NA
14: B 2 2021-02-13 NA
15: B 2 2021-02-14 7