I would like to count how many down and up trends there are in this dataframe containing simple moving average data:
Date SMA_50 SMA_200 Trend Trend CumCount T
2019-09-24 35.559013 38.942979 Down 1
2019-09-25 35.427135 38.904934 Down 2
2019-09-26 35.295066 38.864042 Down 3
2019-09-27 35.165275 38.827087 Down 4
2019-09-30 35.035484 38.788046 Down 5
2020-08-31 34.697343 34.689469 Up 1
2020-09-01 34.768881 34.691034 Up 2
2020-09-02 34.852941 34.690655 Up 3
2020-09-03 34.932827 34.686765 Up 4
2020-09-04 35.009488 34.680598 Up 5
2020-09-08 35.083302 34.675285 Up 6
2020-09-09 35.150474 34.667884 Up 7
2019-10-01 34.895256 38.744687 Down 1
2019-10-02 34.736053 38.701518 Down 2
2019-10-03 34.594877 38.665180 Down 3
2019-10-04 34.466983 38.634488 Down 4
2019-10-07 34.329222 38.605361 Down 5
I would like the result to be:
Date SMA_50 SMA_200 Trend Trend CumCount T
2019-09-24 35.559013 38.942979 Down 1 1
2019-09-25 35.427135 38.904934 Down 2 1
2019-09-26 35.295066 38.864042 Down 3 1
2019-09-27 35.165275 38.827087 Down 4 1
2019-09-30 35.035484 38.788046 Down 5 1
2020-08-31 34.697343 34.689469 Up 1 1
2020-09-01 34.768881 34.691034 Up 2 1
2020-09-02 34.852941 34.690655 Up 3 1
2020-09-03 34.932827 34.686765 Up 4 1
2020-09-04 35.009488 34.680598 Up 5 1
2020-09-08 35.083302 34.675285 Up 6 1
2020-09-09 35.150474 34.667884 Up 7 1
2019-10-01 34.895256 38.744687 Down 1 2
2019-10-02 34.736053 38.701518 Down 2 2
2019-10-03 34.594877 38.665180 Down 3 2
2019-10-04 34.466983 38.634488 Down 4 2
2019-10-07 34.329222 38.605361 Down 5 2
Based on Trend column, how can I count how many times up and down trends appear in the dataframe?
CodePudding user response:
# I ignored the "Trend CumCount" column in testing,
# but it shouldn't change anything.
df.loc[df.Trend.ne(df.Trend.shift(-1)), 'T'] = 1
df['T'] = df.groupby('Trend')['T'].cumsum().bfill()
print(df)
Output:
Date SMA_50 SMA_200 Trend T
0 2019-09-24 35.559013 38.942979 Down 1.0
1 2019-09-25 35.427135 38.904934 Down 1.0
2 2019-09-26 35.295066 38.864042 Down 1.0
3 2019-09-27 35.165275 38.827087 Down 1.0
4 2019-09-30 35.035484 38.788046 Down 1.0
5 2020-08-31 34.697343 34.689469 Up 1.0
6 2020-09-01 34.768881 34.691034 Up 1.0
7 2020-09-02 34.852941 34.690655 Up 1.0
8 2020-09-03 34.932827 34.686765 Up 1.0
9 2020-09-04 35.009488 34.680598 Up 1.0
10 2020-09-08 35.083302 34.675285 Up 1.0
11 2020-09-09 35.150474 34.667884 Up 1.0
12 2019-10-01 34.895256 38.744687 Down 2.0
13 2019-10-02 34.736053 38.701518 Down 2.0
14 2019-10-03 34.594877 38.665180 Down 2.0
15 2019-10-04 34.466983 38.634488 Down 2.0
16 2019-10-07 34.329222 38.605361 Down 2.0
CodePudding user response:
Use a combination of replace
, diff
, and cumsum
:
df["T"] = ((df.Trend.replace({"Down": 0, "Up": 1}).diff() == -1) 1).cumsum()
CodePudding user response:
You can identify when the Trend shifts back to the first value:
# equal to first Trend?
s = df['Trend'].eq(df['Trend'].iloc[0])
# when is the first value coming back for the first time after a change?
# perform cumsum to get T
df['T'] = (s&~s.shift(fill_value=False)).cumsum()
output:
Date SMA_50 SMA_200 Trend Trend CumCount T
0 2019-09-24 35.559013 38.942979 Down 1 1
1 2019-09-25 35.427135 38.904934 Down 2 1
2 2019-09-26 35.295066 38.864042 Down 3 1
3 2019-09-27 35.165275 38.827087 Down 4 1
4 2019-09-30 35.035484 38.788046 Down 5 1
5 2020-08-31 34.697343 34.689469 Up 1 1
6 2020-09-01 34.768881 34.691034 Up 2 1
7 2020-09-02 34.852941 34.690655 Up 3 1
8 2020-09-03 34.932827 34.686765 Up 4 1
9 2020-09-04 35.009488 34.680598 Up 5 1
10 2020-09-08 35.083302 34.675285 Up 6 1
11 2020-09-09 35.150474 34.667884 Up 7 1
12 2019-10-01 34.895256 38.744687 Down 1 2
13 2019-10-02 34.736053 38.701518 Down 2 2
14 2019-10-03 34.594877 38.665180 Down 3 2
15 2019-10-04 34.466983 38.634488 Down 4 2
16 2019-10-07 34.329222 38.605361 Down 5 2