Find number of consecutively increasing/decreasing values in a pandas column (and fill another col w-CodePudding

I am trying to create a new column for a dataframe. The column I use for it is a price column. Basically what I am trying to achieve is getting the number of times that the price has increased/decreased consecutively. I need this to be rather quick because the dataframes can be quite big.

For example the result should look like :

input = [1,2,3,2,1]

increase = [0,1,2,0,0]
decrease = [0,0,0,1,2]

CodePudding user response：

You can compute the diff and apply a cumsum on the positive/negative values:

df = pd.DataFrame({'col': [1,2,3,2,1]})
s = df['col'].diff()
df['increase'] = s.gt(0).cumsum().where(s.gt(0), 0)
df['decrease'] = s.lt(0).cumsum().where(s.lt(0), 0)

Output:

   col  increase  decrease
0    1         0         0
1    2         1         0
2    3         2         0
3    2         0         1
4    1         0         2

CodePudding user response：

Coding this manually using numpy might look like this

import numpy as np

input = np.array([1, 2, 3, 2, 1])

increase = np.zeros(len(input))
decrease = np.zeros(len(input))

for i in range(1, len(input)):
    if input[i] > input[i-1]:
        increase[i] = increase[i-1]   1
        decrease[i] = 0
    elif input[i] < input[i-1]:
        increase[i] = 0
        decrease[i] = decrease[i-1]   1
    else:
        increase[i] = 0
        decrease[i] = 0

increase  # array([0, 1, 2, 0, 0], dtype=int32)
decrease  # array([0, 0, 0, 1, 2], dtype=int32)

CodePudding user response：

data = {
    'input': [1,2,3,2,1]
}

df = pd.DataFrame(data)
diffs = df['input'].diff()
df['a'] = (df['input'] > df['input'].shift(periods=1, axis=0)).cumsum()-(df['input'] > df['input'].shift(periods=1, axis=0)).astype(int).cumsum() \
    .where(~(df['input'] > df['input'].shift(periods=1, axis=0))) \
    .ffill().fillna(0).astype(int)
df['b'] = (df['input'] < df['input'].shift(periods=1, axis=0)).cumsum()-(df['input'] < df['input'].shift(periods=1, axis=0)).astype(int).cumsum() \
    .where(~(df['input'] < df['input'].shift(periods=1, axis=0))) \
    .ffill().fillna(0).astype(int)
print(df)

output

   input  a  b
0      1  0  0
1      2  1  0
2      3  2  0
3      2  0  1
4      1  0  2