I have a dataframe, with a column that increases with every row, and periodically (though not regularly) resets back to 1. I'd like to track/ count these resets in separate column. This for-loop example does exactly what I want, but is incredibly slow when applied to large datasets. Is there a better/ quicker/ more R way to do this same operation:
ColA<-seq(1,20)
ColB<-rep(seq(1,5),4)
DF<-data.frame(ColA, ColB)
DF$ColC<-NA
DF[1,'ColC']<-1
#Removing line 15 and changing line 5 to 1.1 per comments in answer
DF<-DF[-15,]
DF[5,2]<-0.1
for(i in seq(1,nrow(DF)-1)){
print(i)
MyRow<-DF[i 1,]
if(MyRow$ColB < DF[i,'ColB']){
DF[i 1,"ColC"]<-DF[i,"ColC"] 1
}else{
DF[i 1,"ColC"]<-DF[i,"ColC"]
}
}
CodePudding user response:
No need for a loop here. We can just use the vectorized cumsum
. This ought to be faster:
DF$ColC<-cumsum(DF$ColB==1)
DF
To keep using varying variable reset values that are always lower then the previous value, use cumsum(ColB < ColB)
:
DF %>% mutate(colC = cumsum(ColB < lag(ColB, default = Inf)))
ColA ColB ColC
1 1 1.0 1
2 2 2.0 1
3 3 3.0 1
4 4 4.0 1
5 5 0.1 2
6 6 1.0 2
7 7 2.0 2
8 8 3.0 2
9 9 4.0 2
10 10 5.0 2
11 11 1.0 3
12 12 2.0 3
13 13 3.0 3
14 14 4.0 3
16 16 1.0 4
17 17 2.0 4
18 18 3.0 4
19 19 4.0 4
20 20 5.0 4