Home > Back-end >  Count how many blocks in dataframe
Count how many blocks in dataframe

Time:06-26

I would like to count how many down and up trends there are in this dataframe containing simple moving average data:

Date        SMA_50     SMA_200    Trend Trend CumCount      T                                                     
2019-09-24  35.559013  38.942979  Down              1  
2019-09-25  35.427135  38.904934  Down              2            
2019-09-26  35.295066  38.864042  Down              3            
2019-09-27  35.165275  38.827087  Down              4            
2019-09-30  35.035484  38.788046  Down              5            
2020-08-31  34.697343  34.689469    Up              1    
2020-09-01  34.768881  34.691034    Up              2            
2020-09-02  34.852941  34.690655    Up              3            
2020-09-03  34.932827  34.686765    Up              4            
2020-09-04  35.009488  34.680598    Up              5            
2020-09-08  35.083302  34.675285    Up              6            
2020-09-09  35.150474  34.667884    Up              7 
2019-10-01  34.895256  38.744687  Down              1            
2019-10-02  34.736053  38.701518  Down              2            
2019-10-03  34.594877  38.665180  Down              3            
2019-10-04  34.466983  38.634488  Down              4            
2019-10-07  34.329222  38.605361  Down              5           
            

I would like the result to be:

Date        SMA_50     SMA_200    Trend Trend CumCount      T                                                     
2019-09-24  35.559013  38.942979  Down              1       1
2019-09-25  35.427135  38.904934  Down              2       1     
2019-09-26  35.295066  38.864042  Down              3       1     
2019-09-27  35.165275  38.827087  Down              4       1     
2019-09-30  35.035484  38.788046  Down              5       1     
2020-08-31  34.697343  34.689469    Up              1       1
2020-09-01  34.768881  34.691034    Up              2       1     
2020-09-02  34.852941  34.690655    Up              3       1     
2020-09-03  34.932827  34.686765    Up              4       1     
2020-09-04  35.009488  34.680598    Up              5       1     
2020-09-08  35.083302  34.675285    Up              6       1     
2020-09-09  35.150474  34.667884    Up              7       1
2019-10-01  34.895256  38.744687  Down              1       2     
2019-10-02  34.736053  38.701518  Down              2       2     
2019-10-03  34.594877  38.665180  Down              3       2     
2019-10-04  34.466983  38.634488  Down              4       2     
2019-10-07  34.329222  38.605361  Down              5       2

Based on Trend column, how can I count how many times up and down trends appear in the dataframe?

CodePudding user response:

# I ignored the "Trend CumCount" column in testing,
# but it shouldn't change anything.
df.loc[df.Trend.ne(df.Trend.shift(-1)), 'T'] = 1
df['T'] = df.groupby('Trend')['T'].cumsum().bfill()
print(df)

Output:

          Date     SMA_50    SMA_200 Trend    T
0   2019-09-24  35.559013  38.942979  Down  1.0
1   2019-09-25  35.427135  38.904934  Down  1.0
2   2019-09-26  35.295066  38.864042  Down  1.0
3   2019-09-27  35.165275  38.827087  Down  1.0
4   2019-09-30  35.035484  38.788046  Down  1.0
5   2020-08-31  34.697343  34.689469    Up  1.0
6   2020-09-01  34.768881  34.691034    Up  1.0
7   2020-09-02  34.852941  34.690655    Up  1.0
8   2020-09-03  34.932827  34.686765    Up  1.0
9   2020-09-04  35.009488  34.680598    Up  1.0
10  2020-09-08  35.083302  34.675285    Up  1.0
11  2020-09-09  35.150474  34.667884    Up  1.0
12  2019-10-01  34.895256  38.744687  Down  2.0
13  2019-10-02  34.736053  38.701518  Down  2.0
14  2019-10-03  34.594877  38.665180  Down  2.0
15  2019-10-04  34.466983  38.634488  Down  2.0
16  2019-10-07  34.329222  38.605361  Down  2.0

CodePudding user response:

Use a combination of replace, diff, and cumsum:

df["T"] = ((df.Trend.replace({"Down": 0, "Up": 1}).diff() == -1)   1).cumsum()

CodePudding user response:

You can identify when the Trend shifts back to the first value:

# equal to first Trend?
s = df['Trend'].eq(df['Trend'].iloc[0])
# when is the first value coming back for the first time after a change?
# perform cumsum to get T
df['T'] = (s&~s.shift(fill_value=False)).cumsum()

output:

          Date     SMA_50    SMA_200 Trend  Trend CumCount  T
0   2019-09-24  35.559013  38.942979  Down               1  1
1   2019-09-25  35.427135  38.904934  Down               2  1
2   2019-09-26  35.295066  38.864042  Down               3  1
3   2019-09-27  35.165275  38.827087  Down               4  1
4   2019-09-30  35.035484  38.788046  Down               5  1
5   2020-08-31  34.697343  34.689469    Up               1  1
6   2020-09-01  34.768881  34.691034    Up               2  1
7   2020-09-02  34.852941  34.690655    Up               3  1
8   2020-09-03  34.932827  34.686765    Up               4  1
9   2020-09-04  35.009488  34.680598    Up               5  1
10  2020-09-08  35.083302  34.675285    Up               6  1
11  2020-09-09  35.150474  34.667884    Up               7  1
12  2019-10-01  34.895256  38.744687  Down               1  2
13  2019-10-02  34.736053  38.701518  Down               2  2
14  2019-10-03  34.594877  38.665180  Down               3  2
15  2019-10-04  34.466983  38.634488  Down               4  2
16  2019-10-07  34.329222  38.605361  Down               5  2
  • Related