Home > Software design >  Loop over dataframe and change value by compared to the previous line value
Loop over dataframe and change value by compared to the previous line value

Time:07-05

i need to do a loop over rows of dataframe. My goal is to change value compared to the previous line value. For instance, it's my dataframe :

index date id value_dex
90256 2021-05-10 01:45:20 101904285 7.6
90257 2021-05-10 01:45:20 101904285 7.6
90258 2021-05-10 02:00:00 101904285 NaN
90259 2021-05-10 02:00:44 101904285 NaN
90260 2021-05-10 02:00:44 101904285 NaN
90261 2021-05-10 02:00:44 101904285 NaN
90262 2021-05-10 02:21:51 101904285 NaN
90263 2021-05-10 02:21:51 101904285 NaN
90264 2021-05-10 02:21:51 101904285 NaN
90265 2021-05-10 02:21:51 101904285 NaN
90266 2021-05-10 03:00:00 101904285 NaN
90267 2021-05-10 03:00:44 101904285 12.0
90268 2021-05-10 03:00:44 101904285 NaN
90269 2021-05-10 03:00:44 101904285 NaN

I want to change the values to 30 minutes if there are no intermediate values. For this I use a dictionary that for each id will update the last date when the value was changed. The result I want to obtain is the following :

index date id value_dex
90256 2021-05-10 01:45:20 101904285 7.6
90257 2021-05-10 01:45:20 101904285 7.6
90258 2021-05-10 02:00:00 101904285 7.6
90259 2021-05-10 02:00:44 101904285 7.6
90260 2021-05-10 02:00:44 101904285 7.6
90261 2021-05-10 02:00:44 101904285 7.6
90262 2021-05-10 02:21:51 101904285 NaN
90263 2021-05-10 02:21:51 101904285 NaN
90264 2021-05-10 02:21:51 101904285 NaN
90265 2021-05-10 02:21:51 101904285 NaN
90266 2021-05-10 03:00:00 101904285 NaN
90267 2021-05-10 03:00:44 101904285 12.0
90268 2021-05-10 03:00:44 101904285 12.0
90269 2021-05-10 03:00:44 101904285 12.0

I have some problems with my loop because for some lines, it do not update the values, here my code :

    name_columns = ['value_dex']
    # List of ID
    list_id = []
    for i, row in data.iterrows():
        if(isNaN(data.at[i, column])):
            list_id.append(data.at[i, 'id'])
            
    # Remove duplicate
    id_row = set(list_id)
    id_row = list(id_row)
    
    # Create dictionnary
    timestp = [0] * len(id_row)
    dictionary = dict(zip(id_row, timestp))
    
    # Add initial value
    first_date = "2000-01-01 00:00:00"
    first_date = pd.to_datetime(first_date, infer_datetime_format=True)
    for key in dictionary:
        dictionary.update({key: first_date})
    
    # Print dictionnary
    #for key in dictionary:
    #    print(key, '->', dictionary[key])
    
    # For each line of the dataframe
    for i, row in data.iterrows():
        
        # if column value is not nan
        if(isNaN(data.at[i, column])):
            
            # calculation of the 30 minutes slot
            str_time = data.at[i, 'date']
            str_time = pd.to_datetime(str_time, infer_datetime_format=True)
            final_time = str_time   timedelta(minutes=30)
            key_id = data.at[i, 'id']
            column_value = data.at[i, column]
            
            
            if(i == (len(data)-1)):
                j = i
            else:
                j = i   1
            
            while data.at[j, 'date'] < final_time and data.at[j, 'date'] > dictionary[key_id] and data.at[j, 'id'] == key_id :
                if(isNaN(data.at[j, column]) == False):
                    data.at[j, column] = column_value
                   
                    j  = 1
                else:
                    new_time = data.at[i, 'date']   timedelta(seconds=1)
                    dictionary[key_id] = new_time
                    break

Do you have any solutions that could resolve my problems ? I think my loop is not very optimized ... Thanks a lot !

CodePudding user response:

IIUC your code, you can use rolling:

ffill = lambda x: pd.DataFrame(x.rolling('30T', on='date')['value_dex'].max(),
                               index=x.index)

df['value_dex'] = df.groupby('id', as_index=False).apply(ffill)
print(df)

# Output
                     date         id  value_dex
90256 2021-05-10 01:45:20  101904285        7.6
90257 2021-05-10 01:45:20  101904285        7.6
90258 2021-05-10 02:00:00  101904285        7.6
90259 2021-05-10 02:00:44  101904285        7.6
90260 2021-05-10 02:00:44  101904285        7.6
90261 2021-05-10 02:00:44  101904285        7.6
90262 2021-05-10 02:21:51  101904285        7.6
90263 2021-05-10 02:21:51  101904285        7.6
90264 2021-05-10 02:21:51  101904285        7.6
90265 2021-05-10 02:21:51  101904285        7.6
90266 2021-05-10 03:00:00  101904285        NaN
90267 2021-05-10 03:00:44  101904285       12.0
90268 2021-05-10 03:00:44  101904285       12.0
90269 2021-05-10 03:00:44  101904285       12.0
  • Related