thanks for reading and for your help. I am trying to compose the information of a data.table. and it seems that I have not been successful, and little by little I have run out of ideas. I have used the data.table package to try to fix the problem, but have not had very good results.
I have the following datatables:
D1<-data.table(store = c("store1"),
client = c("123456"),
week = c(2001,2002,2003, 2004, 2005, 2006, 2007 , 2008, 2007, 2010, 2011,2012, 2013, 2014,2015),
sells = c(3434,NA,6566, NA, 8788,4343,NA , NA, NA, NA, NA,NA, NA, NA,NA))
D2<-data.table(store = c("store1"),
client = c("654321"),
week = c(2001,2002,2003, 2004, 2005, 2006, 2007 , 2008, 2007, 2010, 2011,2012, 2013, 2014,2015),
sells = c(45455,45454,5454, NA, 65656,5858,43434 , 55898, NA, NA, NA,NA, NA, NA,NA))
DT<-rbind(D1,D2)
my DT is:
store client week sells
1: store1 123456 2001 3434
2: store1 123456 2002 NA
3: store1 123456 2003 6566
4: store1 123456 2004 NA
5: store1 123456 2005 8788
6: store1 123456 2006 4343
7: store1 123456 2007 NA
8: store1 123456 2008 NA
9: store1 123456 2007 NA
10: store1 123456 2010 NA
11: store1 123456 2011 NA
12: store1 123456 2012 NA
13: store1 123456 2013 NA
14: store1 123456 2014 NA
15: store1 123456 2015 NA
16: store1 654321 2001 45455
17: store1 654321 2002 45454
18: store1 654321 2003 5454
19: store1 654321 2004 NA
20: store1 654321 2005 65656
21: store1 654321 2006 5858
22: store1 654321 2007 43434
23: store1 654321 2008 55898
24: store1 654321 2007 NA
25: store1 654321 2010 NA
26: store1 654321 2011 NA
27: store1 654321 2012 NA
28: store1 654321 2013 NA
29: store1 654321 2014 NA
30: store1 654321 2015 NA
I need to keep the last week of each store and client, and remove the NAs after the maximum week
store client week sells
1: store1 123456 2001 3434
2: store1 123456 2002 NA
3: store1 123456 2003 6566
4: store1 123456 2004 NA
5: store1 123456 2005 8788
6: store1 123456 2006 4343
7: store1 654321 2001 45455
8: store1 654321 2002 45454
9: store1 654321 2003 5454
10: store1 654321 2004 NA
11: store1 654321 2005 65656
12: store1 654321 2006 5858
13: store1 654321 2007 43434
14: store1 654321 2008 55898
I have tried to use the .SD function together with which.max() to get the maximums, but I can't solve it
max<-DT[, .SD[which.max(week)], keyby = c("store","client")]
Thank you very much for your kind answers.
CodePudding user response:
You're close, but which.max
is only going to return a single row (per group). Try this:
DT[, .SD[ rev(cumsum(rev(!is.na(sells))) > 0), ], by = .(store, client) ]
# store client week sells
# <char> <char> <num> <num>
# 1: store1 123456 2001 3434
# 2: store1 123456 2002 NA
# 3: store1 123456 2003 6566
# 4: store1 123456 2004 NA
# 5: store1 123456 2005 8788
# 6: store1 123456 2006 4343
# 7: store1 654321 2001 45455
# 8: store1 654321 2002 45454
# 9: store1 654321 2003 5454
# 10: store1 654321 2004 NA
# 11: store1 654321 2005 65656
# 12: store1 654321 2006 5858
# 13: store1 654321 2007 43434
# 14: store1 654321 2008 55898
The premise:
!is.na(sells)
returns true or false;- when a logical is summed, false is 0 and true is 1
cumsum(.) > 0
is effectively a rolling "if any", meaning that as soon as one value is true, the whole conditional remains true for the rest of the values- because we want to go from bottom-up, however, we need to do the double-
rev
; if you envision going from bottom-to-top in any particular group, this means "starting from the first (bottom-most) week with a non-NA
sells, keep that row and everything above it"
CodePudding user response:
Another attempt, using the .I
rows index to subset all rows with an index <=
to the last non-NA
:
DT[DT[, .I <= .I[last(which(!is.na(sells)))], by=.(store,client)]$V1]
CodePudding user response:
You were close, but you can use .SD within .SD[] to do this:
DT[, .SD[1:which(week == .SD[!is.na(sells), max(week)])], keyby = c("store","client")]
Note: which.max doesn't work because of trying to remove the NA's after the last week with sells.