Home > Mobile >  Find previous row with the same ID and condition on column value to assign weight
Find previous row with the same ID and condition on column value to assign weight

Time:06-10

I have the following dataframe:

    id      date        status
--------------------------------
0    2     2019-02-04   reserve    
1    2     2020-01-01   reserve    
2    2     2020-01-02   sold      
3    3     2020-01-03   reserve    
4    4     2020-01-03   booked     
5    3     2020-02-05   reserve    
6    4     2020-02-06   sold       
7    3     2020-02-07   sold

I need to add weight for each row with the following condtion:

If row has status == sold then weight is 1.
For the rest, I have to do the following: for the df[i] find previous row with the same id and status and based on day diffrence apply weight like so (1 / days difference).

The desired dataframe is:

    id      date        status    weight
----------------------------------------
0    2     2019-02-04   reserve     0 
1    2     2020-01-01   reserve   0.003 
2    2     2020-01-02   sold        1
3    3     2020-01-03   reserve     0 
4    4     2020-01-03   booked      0 
5    3     2020-02-05   reserve   0.030  
6    4     2020-02-06   sold        1 
7    3     2020-02-07   sold        1

Here's my attempt, for now I can't deal with finding previous row w/ condition:

df.date = pd.to_datetime(df.date)
df.sort_values(['date', 'id'], inplace=True)
df = df.reset_index(drop='index')

df['weight'] = np.where(df['status'] == 'sold', 1, 0)
df = df.reset_index(drop='index')
df1 = df[~(df.status == 'sold')]
retain_index = list(df1.index.values)
df1 = df1.reset_index(drop='index')

df1['diff_in_dates'] = df1.groupby('flat_id')['date'].diff().dt.days.fillna(0)

for index, row in df1.iterrows():
    if index == 0:
        pass
    else:
        if df1.loc[index, 'weight'] == 0:
            if (df1.loc[index, 'status'] == df1.loc[index - 1, 'status']) & \
                                    (df1.loc[index, 'flat_id'] == df1.loc[index - 1, 'flat_id']):
                df1.loc[index, 'weight'] = 1 / df1.loc[index, 'diff_in_dates']

df1.weight.replace([np.inf, -np.inf], np.nan, inplace=True)
df1.weight = df1.weight.fillna(0)

#Here, in df1 I have weight = 0 for df[5] instead of 0.030

df = pd.DataFrame(df.append(df1, ignore_index = False)

CodePudding user response:

You can use a GroupBy.diff difference, extract the days and get the inverse, then use numpy.where to chose between the sold/non-sold status to know how to fill the default:

s = df.groupby(['id', 'status'])['date'].diff().dt.days.rdiv(1)

import numpy as np
df['weight'] = np.where(df['status'].eq('sold'), 1, s.fillna(0))

output:

   id       date   status    weight
0   2 2019-02-04  reserve  0.000000
1   2 2020-01-01  reserve  0.003021
2   2 2020-01-02     sold  1.000000
3   3 2020-01-03  reserve  0.000000
4   4 2020-01-03   booked  0.000000
5   3 2020-02-05  reserve  0.030303
6   4 2020-02-06     sold  1.000000
7   3 2020-02-07     sold  1.000000
  • Related