Append a conditional value to pandas dataframe-CodePudding

I am trying to check three continuous values in a column and if they are all positive, then create a new column with a string value in the third row. My index is the date index.

I want a new column created in my data frame and want to check in a loop if three consecutive values in a row are positive, then return a string value of 'increasing' or if all three are negative, then return a value of 'decreasing' or if neither, then return 'none'. And this new value should be in the new column and in the row that is the last one of the three values that have been checked.

I have tried below but whatever variation I use, it is not working.

df['num_change'] = df.num.diff()
result = []

for i in range(len(df)):
    
    if np.all(df['num_change'].values[i:i 3]) < 0:
        result.loc[[i 3],'Trend'] =('decreasing')
        
    elif np.all(df['num_change'].values[i:i 3]) > 0:
        result.loc[[i 3],'Trend'] =('increasing')
        
    else:
        result.loc[[i 3],'Trend'] =('none')
        

df["new_col"] = result

I am unfortunately not able to insert an image here, I hope someone is patient enough to help me still.

CodePudding user response：

You could do this as follows:

import pandas as pd
import numpy as np

df = pd.DataFrame({'col' : [1,2,3,-4,-5,-6,7,8,9]})

start = 0
end = 3
result = [None] * 2 # because trend will start after the third value

while end <= len(df.col):

    if np.all(df.col[start:end] > 0):
        result.append("Increasing")

    elif np.all(df.col[start:end] < 0):
        result.append("Decreasing")

    else:
        result.append(None)

    start  = 1
    end  = 1

df["new_col"] = result

In this solution, the while-loop runs till the subset of the column in the data frame has at least 3 values, i.e. end is less than or equals to the length of df.col. Inside it, the first three elements of the column will be checked. If all of them are greater than 0, then the trend "increasing" will be added to the result. If not, then the trend "decreasing" will be added. Otherwise, None is added.

The first two elements of the result are None because there can be no comparison for the first two elements as the comparison is for the first 3 elements and so on. The start and end are 0 and 3 respectively, which are incremented by 1 after each iteration. The output is as shown below:

>>> df
   col     new_col
0    1        None
1    2        None
2    3  Increasing
3   -4        None
4   -5        None
5   -6  Decreasing
6    7        None
7    8        None
8    9  Increasing

CodePudding user response：

This can be achieved with a custom rolling without an (explicit) loop

First we define the aggregation (it has to return a numeric value):

def trend(s):
    if (s < 0).all():
        return -1
    if (s > 0).all():
        return 1
    return 0

Now apply it and map to a label

df['trend'] = (df['col'].rolling(3, min_periods = 1)
                        .apply(trend)
                        .map({1:'Increasing', -1:'Decreasing', 0:'none'})
            )

output

    col trend
0   1   Increasing
1   2   Increasing
2   3   Increasing
3   -4  none
4   -5  none
5   -6  Decreasing
6   7   none
7   8   none
8   9   Increasing

Note that we set min_periods to 1 here which has the effect of filling the first two rows based on the sub-series of 1 or 2 elements. if you don't want that you can delete the min_periods bit