Sequentially update DataFrame value based on previous row's value-CodePudding

Not sure if there is a more elegant way to do what I want to do. Basically, I need to determine the current row's "position" value based on the "factor" value and the previous row's "position" value.

I tried to loop through the DataFrame and use some if else statements to update the value, but it is very clumpy and the values didn't get updated.

Please kindly help, million thanks!

                       factor  position
time                                   
2022-05-13 06:00:00  0.489471         0
2022-05-13 07:00:00  0.711030         0
2022-05-13 08:00:00  0.566865         0
2022-05-13 09:00:00  0.489471         0
2022-05-13 10:00:00  0.288419         0

import pandas as pd

df = pd.DataFrame({'time': ['2022-05-13 06:00:00', '2022-05-13 07:00:00', '2022-05-13 08:00:00','2022-05-13 09:00:00', '2022-05-13 10:00:00'],
                   'factor': [0.489471, 0.711030, 0.566865, 0.489471, 0.288419],
                   'position': [0, 0, 0, 0, 0]})
df['time'] = pd.to_datetime(df['time'])
df.set_index('time', inplace=True)

threshold_2 = 0.7
threshold_1 = 0.35

for i in range(0, len(df)):
    # no position
    if i == 0 or df.iloc[i-1, :]['position'] == 0:
        if df.iloc[i, :]['factor'] > threshold_2:
            df.iloc[i, :]['position'] = 1
        else:
            df.iloc[i, :]['position'] = 0

    #has position
    elif df.iloc[i-1, :]['position'] != 0:
        if df.iloc[i, :]['factor'] > threshold_1:
            df.iloc[i, :]['position'] = 1
        else:
            df.iloc[i, :]['position'] = 0

CodePudding user response：

IIUC, you can use this vectorial alternative:

threshold_2 = 0.7
threshold_1 = 0.35

m1 = df['factor'].gt(threshold_2)

group = m1.cumsum()

m2 = df.loc[group>0, 'factor'].gt(threshold_1).groupby(group).cummin()

df['position'] = (m1|df.index.isin(m2[m2].index)).astype(int)

output:

                       factor  position
time                                   
2022-05-13 06:00:00  0.489471         0
2022-05-13 07:00:00  0.711030         1
2022-05-13 08:00:00  0.566865         1
2022-05-13 09:00:00  0.489471         1
2022-05-13 10:00:00  0.288419         0

CodePudding user response：

When you use chained indexing, the order and type of the indexing operation partially determine whether the result is a slice into the original object, or a copy of the slice.

Read more about it here

I made some little changes to your code, so that it is working, e.g you don't need the inner if/else, you can just write 1 if foo else 0.

df = pd.DataFrame({'time': ['2022-05-13 06:00:00', '2022-05-13 07:00:00', '2022-05-13 08:00:00','2022-05-13 09:00:00', '2022-05-13 10:00:00'],
                   'factor': [0.489471, 0.711030, 0.566865, 0.489471, 0.288419],
                   'position': [0, 0, 0, 0, 0]})
df['time'] = pd.to_datetime(df['time'])
df.set_index('time', inplace=True)
threshold_2 = 0.7
threshold_1 = 0.35

for i in range(0, len(df)):
    # no position
    if i == 0 or df.loc[df.index[i-1], 'position'] == 0:
        df.loc[df.index[i], 'position'] = 1 if df.loc[df.index[i], 'factor'] > threshold_2 else 0
    #has position
    elif df.loc[df.index[i-1], 'position'] != 0:
        df.loc[df.index[i], 'position'] = 1 if df.loc[df.index[i], 'factor'] > threshold_1 else 0

print(df)

                       factor  position
time                                   
2022-05-13 06:00:00  0.489471         0
2022-05-13 07:00:00  0.711030         1
2022-05-13 08:00:00  0.566865         1
2022-05-13 09:00:00  0.489471         1
2022-05-13 10:00:00  0.288419         0