I have the following data:
library(data.table)
D1 <- data.table(store = "store1",
client = "123456",
week = c(2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2007, 2010, 2011,2012, 2013, 2014, 2015),
sells = c(3434, NA, 6566, NA, 8788, 4343, NA , NA, NA, NA, NA, NA, NA, NA, NA))
D2 <- data.table(store = "store1",
client = "654321",
week = c(2001,2002,2003, 2004, 2005, 2006, 2007 , 2008, 2007, 2010, 2011, 2012, 2013, 2014, 2015),
sells = c(45455, 45454, 5454, NA, 65656, 5858, 43434, 55898, NA, NA, NA, NA, NA, NA, NA))
DT <- rbind(D1, D2)
For each store and client in DT, I want to keep rows until the last non-NA
value of sells, and remove the subsequent rows.
store client week sells
1: store1 123456 2001 3434
2: store1 123456 2002 NA
3: store1 123456 2003 6566
4: store1 123456 2004 NA
5: store1 123456 2005 8788
6: store1 123456 2006 4343 # last non-NA for store1, client 123456
7: store1 123456 2007 NA
8: store1 123456 2008 NA
9: store1 123456 2007 NA
10: store1 123456 2010 NA
11: store1 123456 2011 NA
12: store1 123456 2012 NA
13: store1 123456 2013 NA
14: store1 123456 2014 NA
15: store1 123456 2015 NA
16: store1 654321 2001 45455
17: store1 654321 2002 45454
18: store1 654321 2003 5454
19: store1 654321 2004 NA
20: store1 654321 2005 65656
21: store1 654321 2006 5858
22: store1 654321 2007 43434
23: store1 654321 2008 55898 # last non-NA for store1, client 654321
24: store1 654321 2007 NA
25: store1 654321 2010 NA
26: store1 654321 2011 NA
27: store1 654321 2012 NA
28: store1 654321 2013 NA
29: store1 654321 2014 NA
30: store1 654321 2015 NA
Desired result:
store client week sells
1: store1 123456 2001 3434
2: store1 123456 2002 NA
3: store1 123456 2003 6566
4: store1 123456 2004 NA
5: store1 123456 2005 8788
6: store1 123456 2006 4343
7: store1 654321 2001 45455
8: store1 654321 2002 45454
9: store1 654321 2003 5454
10: store1 654321 2004 NA
11: store1 654321 2005 65656
12: store1 654321 2006 5858
13: store1 654321 2007 43434
14: store1 654321 2008 55898
I have tried using .SD
together with which.max
to get the maximum week, but I can't solve it
DT[, .SD[which.max(week)], keyby = c("store", client")]
Thank you very much for your kind answers.
CodePudding user response:
Another attempt, using the .I
rows index to subset all rows with an index <=
to the last non-NA
:
DT[DT[, .I <= .I[last(which(!is.na(sells)))], by=.(store,client)]$V1]
CodePudding user response:
You're close, but which.max
is only going to return a single row (per group). Try this:
DT[, .SD[ rev(cumsum(rev(!is.na(sells))) > 0), ], by = .(store, client) ]
# store client week sells
# <char> <char> <num> <num>
# 1: store1 123456 2001 3434
# 2: store1 123456 2002 NA
# 3: store1 123456 2003 6566
# 4: store1 123456 2004 NA
# 5: store1 123456 2005 8788
# 6: store1 123456 2006 4343
# 7: store1 654321 2001 45455
# 8: store1 654321 2002 45454
# 9: store1 654321 2003 5454
# 10: store1 654321 2004 NA
# 11: store1 654321 2005 65656
# 12: store1 654321 2006 5858
# 13: store1 654321 2007 43434
# 14: store1 654321 2008 55898
The premise:
!is.na(sells)
returns true or false;- when a logical is summed, false is 0 and true is 1
cumsum(.) > 0
is effectively a rolling "if any", meaning that as soon as one value is true, the whole conditional remains true for the rest of the values- because we want to go from bottom-up, however, we need to do the double-
rev
; if you envision going from bottom-to-top in any particular group, this means "starting from the first (bottom-most) week with a non-NA
sells, keep that row and everything above it"
CodePudding user response:
You were close, but you can use .SD within .SD[] to do this:
DT[, .SD[1:which(week == .SD[!is.na(sells), max(week)])], keyby = c("store","client")]
Note: which.max doesn't work because of trying to remove the NA's after the last week with sells.