I have a dataframe (sample) like this:
import pandas as pd
data = [['A', '2022-09-01', False, 2], ['A', '2022-09-02', False, 3], ['A', '2022-09-03', True, 1],
['A', '2022-09-05', False, 4], ['A', '2022-09-08', True, 4], ['A', '2022-09-09', False, 2],
['B', '2022-09-03', False, 4], ['B', '2022-09-05', True, 5], ['B', '2022-09-06', False, 7],
['B', '2022-09-09', True, 4], ['B', '2022-09-10', False, 2], ['B', '2022-09-11', False, 3]]
df = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'val'])
group date indicator val
0 A 2022-09-01 False 2
1 A 2022-09-02 False 3
2 A 2022-09-03 True 1
3 A 2022-09-05 False 4
4 A 2022-09-08 True 4
5 A 2022-09-09 False 2
6 B 2022-09-03 False 4
7 B 2022-09-05 True 5
8 B 2022-09-06 False 7
9 B 2022-09-09 True 4
10 B 2022-09-10 False 2
11 B 2022-09-11 False 3
I would like to create a column called Diff
, which shows the difference of rows with its nearest (depends on date
) conditional rows (indicator == True) where the conditional rows have a value of 0 per group
. Here is the desired output:
data = [['A', '2022-09-01', False, 2, 1], ['A', '2022-09-02', False, 3, 2], ['A', '2022-09-03', True, 1, 0],
['A', '2022-09-05', False, 4, 3], ['A', '2022-09-08', True, 4, 0], ['A', '2022-09-09', False, 2, -2],
['B', '2022-09-03', False, 4, -1], ['B', '2022-09-05', True, 5, 0], ['B', '2022-09-06', False, 7, 2],
['B', '2022-09-09', True, 4, 0], ['B', '2022-09-10', False, 2, -2], ['B', '2022-09-11', False, 3, -1]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'val', 'Diff'])
group date indicator val Diff
0 A 2022-09-01 False 2 1
1 A 2022-09-02 False 3 2
2 A 2022-09-03 True 1 0
3 A 2022-09-05 False 4 3
4 A 2022-09-08 True 4 0
5 A 2022-09-09 False 2 -2
6 B 2022-09-03 False 4 -1
7 B 2022-09-05 True 5 0
8 B 2022-09-06 False 7 2
9 B 2022-09-09 True 4 0
10 B 2022-09-10 False 2 -2
11 B 2022-09-11 False 3 -1
As you can see it returns the difference respectively with the nearest indicator == True
rows per group
where the conditioned rows have a Diff
of 0. So I was wondering if anyone knows have to get the desired result using pandas
?
Extra info column Diff:
Let's take group A as an example. The column Diff
is calculated by the difference with respect to the nearest row with indicator True. So for example:
- Row 1 is 2 - 1 = 1 with respect to row 3 (nearest True row based on date).
- Row 2 is 3 - 1 = 2 with respect to row 3.
- Row 4 is 4 - 1 = 3 with respect to row 3.
- Row 6 is 2 - 4 = -2 with respect to row 5 (nearest True row based on date).
- The rows with True have a value of 0 in Diff because everything is calculated with respect to these rows.
CodePudding user response:
IIUC use merge_asof
with filtered rows by indicator
and subtract column val
:
df['date'] = pd.to_datetime(df['date'] )
df = df.sort_values('date')
df['Diff'] = df['val'].sub(pd.merge_asof(df,
df[df['indicator']], on='date', by='group', direction='nearest')['val_y'])
df = df.sort_index()