Home > OS >  Add a column that iterates/ counts every time a sequence resets
Add a column that iterates/ counts every time a sequence resets

Time:11-26

I have a dataframe, with a column that increases with every row, and periodically (though not regularly) resets back to 1. I'd like to track/ count these resets in separate column. This for-loop example does exactly what I want, but is incredibly slow when applied to large datasets. Is there a better/ quicker/ more R way to do this same operation:

ColA<-seq(1,20)
ColB<-rep(seq(1,5),4)

DF<-data.frame(ColA, ColB)
DF$ColC<-NA

DF[1,'ColC']<-1

#Removing line 15 and changing line 5 to 1.1 per comments in answer
DF<-DF[-15,]
DF[5,2]<-0.1

for(i in seq(1,nrow(DF)-1)){
  
  print(i)
  
  MyRow<-DF[i 1,]
  
  if(MyRow$ColB < DF[i,'ColB']){
    DF[i 1,"ColC"]<-DF[i,"ColC"]  1
  }else{
    DF[i 1,"ColC"]<-DF[i,"ColC"]
  }
}

CodePudding user response:

No need for a loop here. We can just use the vectorized cumsum. This ought to be faster:

DF$ColC<-cumsum(DF$ColB==1)

DF

To keep using varying variable reset values that are always lower then the previous value, use cumsum(ColB < ColB):

DF %>% mutate(colC = cumsum(ColB < lag(ColB, default = Inf)))

   ColA ColB ColC
1     1  1.0    1
2     2  2.0    1
3     3  3.0    1
4     4  4.0    1
5     5  0.1    2
6     6  1.0    2
7     7  2.0    2
8     8  3.0    2
9     9  4.0    2
10   10  5.0    2
11   11  1.0    3
12   12  2.0    3
13   13  3.0    3
14   14  4.0    3
16   16  1.0    4
17   17  2.0    4
18   18  3.0    4
19   19  4.0    4
20   20  5.0    4
  • Related