I have a dataframe with multiple dimensional index using pandas
. Say employee_id and date. Now I want to update records that's prior to a specific date, say 2020-01-01. To be consistent with other codes, the update was made using np.where
. So how can I add this date filter into this assignment, df['Sale'] = np.where(df.sale_actual>df.sale_expect, df.sale_actual, df.sale_expect)
. Thanks.
CodePudding user response:
One solution is to create a boolean mask from the date
part of the index and use it to mask out non-relevant rows during update
Sample input dataframe
df = pd.DataFrame([[0, '2019-01-01', 100, 200, 0], [1, '2019-02-01', 150, 100, 0], [0, '2021-12-12', 200, 100, 0]], columns=['id', 'date','sale_expect', 'sale_actual', 'sale'])
df['date'] = pd.to_datetime(df['date'])
df = df.set_index(['id', 'date'], drop=True)
sale_expect sale_actual sale
id date
0 2019-01-01 100 200 0
1 2019-02-01 150 100 0
0 2021-12-12 200 100 0
Solution
mask = df.index.get_level_values(1) < np.datetime64('2020-01-01')
df.loc[mask, 'sale'] = np.where(df[mask].sale_actual>df[mask].sale_expect, df[mask].sale_actual, df[mask].sale_expect)
Result
sale_expect sale_actual sale
id date
0 2019-01-01 100 200 200
1 2019-02-01 150 100 150
0 2021-12-12 200 100 0