I am new on Python and I thank you therefore for your help.
Here is my dF :
df = pd.DataFrame{'Id': {0: 'ae9b0886-7e2b-4c37-a3a3',
1: 'ae9b0886-7e2b-4c37-a3a3',
2: 'ae290c85-9dfb-440f-becb',
3: 'ae290c85-9dfb-440f-becb',
4: 'ae290c85-9dfb-440f-becb',
5: 'ae290c85-9dfb-440f-becb',
6: 'ae290c85-9dfb-440f-becb',
7: 'ae290c85-9dfb-440f-becb',
8: 'ae290c85-9dfb-440f-becb',
9: 'ae290c85-9dfb-440f-becb',
10: 'ae290c85-9dfb-440f-becb',
11: 'b92faffa-89cd-48db-aafd',
12: 'b92faffa-89cd-48db-aafd',
13: '88f8b058-8b8a-4a80-84a2',
14: '88f8b058-8b8a-4a80-84a2',
15: '88f8b058-8b8a-4a80-84a2',
16: '88f8b058-8b8a-4a80-84a2',
17: '88f8b058-8b8a-4a80-84a2',
18: '88f8b058-8b8a-4a80-84a2',
19: '88f8b058-8b8a-4a80-84a2'},
> 'lastname': {0: 'Baco',
1: 'Baco',
2: 'Azi',
3: 'Azi',
4: 'Azi',
5: 'Azi',
6: 'Azi',
7: 'Azi',
8: 'Azi',
9: 'Azi',
10: 'Azi',
11: 'SOFTY',
12: 'SOFTY',
13: 'Dup',
14: 'Dup',
15: 'Dup',
16: 'Dup',
17: 'Dup',
18: 'Dup',
19: 'Dup'},
> 'ID_VALIDATION': {0: 82552217,
1: 82544581,
2: 82538959,
3: 82405234,
4: 82376176,
5: 82358274,
6: 82347060,
7: 82294311,
8: 82203773,
9: 82176910,
10: 82575141,
11: 82396159,
12: 82393258,
13: 82364079,
14: 82382504,
15: 82532881,
16: 82163257,
17: 82267321,
18: 82341659,
19: 82305609},
> 'DTHR_OPERATION': {0: Timestamp('2022-09-28 08:10:41'),
1: Timestamp('2022-09-28 12:06:44'),
2: Timestamp('2022-09-28 07:22:30'),
3: Timestamp('2022-09-23 07:23:13'),
4: Timestamp('2022-09-22 07:31:07'),
5: Timestamp('2022-09-21 15:38:03'),
6: Timestamp('2022-09-21 07:25:34'),
7: Timestamp('2022-09-19 17:00:03'),
8: Timestamp('2022-09-16 07:24:12'),
9: Timestamp('2022-09-15 07:21:46'),
10: Timestamp('2022-09-29 07:23:11'),
11: Timestamp('2022-09-22 16:08:38'),
12: Timestamp('2022-09-22 15:40:54'),
13: Timestamp('2022-09-22 07:03:44'),
14: Timestamp('2022-09-22 15:12:24'),
15: Timestamp('2022-09-28 07:03:53'),
16: Timestamp('2022-09-15 07:03:32'),
17: Timestamp('2022-09-19 07:03:53'),
18: Timestamp('2022-09-21 07:03:14'),
19: Timestamp('2022-09-20 07:03:47')},
> 'TYPE_OPER_VALIDATION': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 3,
12: 1,
13: 1,
14: 1,
15: 1,
16: 1,
17: 1,
18: 1,
19: 1}}
I want to add a new date column df['DTHR_OPERATION_bis]'
with the closest date from DTHR_OPERATION
, for a particular client (Id
). To create such column, I am trying to do a junction, first by Id
, then by the closest date.
I am struggling to do so as I actually want to merge by Id
and then merge_asof by DTHR_OPERATION
within the same junction. Is it even possible ?
Many thanks for the support :)
Here is what I tried but it doesn't work out and the code is incomplete :
df1['DTHR_OPERATION_bis'] = df1.loc[:, 'DTHR_OPERATION']
df1.head()
tol = pd.Timedelta('1 day')
df3 = pd.merge_asof(left=df1['DTHR_OPERATION'],right=df1['DTHR_OPERATION_bis'],direction='nearest',tolerance=tol)
CodePudding user response:
You can use merge_asof
with the correct parameters:
df['DTHR_OPERATION'] = pd.to_datetime(df['DTHR_OPERATION'])
out = pd.merge_asof(
# left dataframe saving original index as column
df.reset_index().sort_values(by='DTHR_OPERATION'),
# right: same dataframe renaming date column to _bis
(df[['Id', 'DTHR_OPERATION']].sort_values(by='DTHR_OPERATION')
.rename(columns={'DTHR_OPERATION': 'DTHR_OPERATION_bis'})),
# define Id and date to join
by='Id', left_on='DTHR_OPERATION', right_on='DTHR_OPERATION_bis',
# we want the closest but not the same date
direction='nearest', allow_exact_matches=False
).set_index('index').sort_index()
output:
Id lastname ID_VALIDATION DTHR_OPERATION TYPE_OPER_VALIDATION DTHR_OPERATION_bis
index
0 ae9b0886-7e2b-4c37-a3a3 Baco 82552217 2022-09-28 08:10:41 1 2022-09-28 12:06:44
1 ae9b0886-7e2b-4c37-a3a3 Baco 82544581 2022-09-28 12:06:44 1 2022-09-28 08:10:41
2 ae290c85-9dfb-440f-becb Azi 82538959 2022-09-28 07:22:30 1 2022-09-29 07:23:11
3 ae290c85-9dfb-440f-becb Azi 82405234 2022-09-23 07:23:13 1 2022-09-22 07:31:07
4 ae290c85-9dfb-440f-becb Azi 82376176 2022-09-22 07:31:07 1 2022-09-21 15:38:03
5 ae290c85-9dfb-440f-becb Azi 82358274 2022-09-21 15:38:03 1 2022-09-21 07:25:34
6 ae290c85-9dfb-440f-becb Azi 82347060 2022-09-21 07:25:34 1 2022-09-21 15:38:03
7 ae290c85-9dfb-440f-becb Azi 82294311 2022-09-19 17:00:03 1 2022-09-21 07:25:34
8 ae290c85-9dfb-440f-becb Azi 82203773 2022-09-16 07:24:12 1 2022-09-15 07:21:46
9 ae290c85-9dfb-440f-becb Azi 82176910 2022-09-15 07:21:46 1 2022-09-16 07:24:12
10 ae290c85-9dfb-440f-becb Azi 82575141 2022-09-29 07:23:11 1 2022-09-28 07:22:30
11 b92faffa-89cd-48db-aafd SOFTY 82396159 2022-09-22 16:08:38 3 2022-09-22 15:40:54
12 b92faffa-89cd-48db-aafd SOFTY 82393258 2022-09-22 15:40:54 1 2022-09-22 16:08:38
13 88f8b058-8b8a-4a80-84a2 Dup 82364079 2022-09-22 07:03:44 1 2022-09-22 15:12:24
14 88f8b058-8b8a-4a80-84a2 Dup 82382504 2022-09-22 15:12:24 1 2022-09-22 07:03:44
15 88f8b058-8b8a-4a80-84a2 Dup 82532881 2022-09-28 07:03:53 1 2022-09-22 15:12:24
16 88f8b058-8b8a-4a80-84a2 Dup 82163257 2022-09-15 07:03:32 1 2022-09-19 07:03:53
17 88f8b058-8b8a-4a80-84a2 Dup 82267321 2022-09-19 07:03:53 1 2022-09-20 07:03:47
18 88f8b058-8b8a-4a80-84a2 Dup 82341659 2022-09-21 07:03:14 1 2022-09-20 07:03:47
19 88f8b058-8b8a-4a80-84a2 Dup 82305609 2022-09-20 07:03:47 1 2022-09-21 07:03:14