I have the following dataframe:
id date status
--------------------------------
0 2 2019-02-04 reserve
1 2 2020-01-01 reserve
2 2 2020-01-02 sold
3 3 2020-01-03 reserve
4 4 2020-01-03 booked
5 3 2020-02-05 reserve
6 4 2020-02-06 sold
7 3 2020-02-07 sold
I need to add weight for each row with the following condtion:
If row has status == sold
then weight
is 1
.
For the rest, I have to do the following: for the df[i]
find previous row with the same id
and status
and based on day diffrence apply weight like so (1 / days difference
).
The desired dataframe is:
id date status weight
----------------------------------------
0 2 2019-02-04 reserve 0
1 2 2020-01-01 reserve 0.003
2 2 2020-01-02 sold 1
3 3 2020-01-03 reserve 0
4 4 2020-01-03 booked 0
5 3 2020-02-05 reserve 0.030
6 4 2020-02-06 sold 1
7 3 2020-02-07 sold 1
Here's my attempt, for now I can't deal with finding previous row w/ condition:
df.date = pd.to_datetime(df.date)
df.sort_values(['date', 'id'], inplace=True)
df = df.reset_index(drop='index')
df['weight'] = np.where(df['status'] == 'sold', 1, 0)
df = df.reset_index(drop='index')
df1 = df[~(df.status == 'sold')]
retain_index = list(df1.index.values)
df1 = df1.reset_index(drop='index')
df1['diff_in_dates'] = df1.groupby('flat_id')['date'].diff().dt.days.fillna(0)
for index, row in df1.iterrows():
if index == 0:
pass
else:
if df1.loc[index, 'weight'] == 0:
if (df1.loc[index, 'status'] == df1.loc[index - 1, 'status']) & \
(df1.loc[index, 'flat_id'] == df1.loc[index - 1, 'flat_id']):
df1.loc[index, 'weight'] = 1 / df1.loc[index, 'diff_in_dates']
df1.weight.replace([np.inf, -np.inf], np.nan, inplace=True)
df1.weight = df1.weight.fillna(0)
#Here, in df1 I have weight = 0 for df[5] instead of 0.030
df = pd.DataFrame(df.append(df1, ignore_index = False)
CodePudding user response:
You can use a GroupBy.diff
difference, extract the days and get the inverse, then use numpy.where
to chose between the sold/non-sold status to know how to fill the default:
s = df.groupby(['id', 'status'])['date'].diff().dt.days.rdiv(1)
import numpy as np
df['weight'] = np.where(df['status'].eq('sold'), 1, s.fillna(0))
output:
id date status weight
0 2 2019-02-04 reserve 0.000000
1 2 2020-01-01 reserve 0.003021
2 2 2020-01-02 sold 1.000000
3 3 2020-01-03 reserve 0.000000
4 4 2020-01-03 booked 0.000000
5 3 2020-02-05 reserve 0.030303
6 4 2020-02-06 sold 1.000000
7 3 2020-02-07 sold 1.000000