Home > front end >  Python merge by closest date for a particular client
Python merge by closest date for a particular client

Time:10-27

I am new on Python and I thank you therefore for your help.

Here is my dF :

df = pd.DataFrame{'Id': {0: 'ae9b0886-7e2b-4c37-a3a3',
1: 'ae9b0886-7e2b-4c37-a3a3',
2: 'ae290c85-9dfb-440f-becb',
3: 'ae290c85-9dfb-440f-becb',
4: 'ae290c85-9dfb-440f-becb',
5: 'ae290c85-9dfb-440f-becb',
6: 'ae290c85-9dfb-440f-becb',
7: 'ae290c85-9dfb-440f-becb',
8: 'ae290c85-9dfb-440f-becb',
9: 'ae290c85-9dfb-440f-becb',
10: 'ae290c85-9dfb-440f-becb',
11: 'b92faffa-89cd-48db-aafd',
12: 'b92faffa-89cd-48db-aafd',
13: '88f8b058-8b8a-4a80-84a2',
14: '88f8b058-8b8a-4a80-84a2',
15: '88f8b058-8b8a-4a80-84a2',
16: '88f8b058-8b8a-4a80-84a2',
17: '88f8b058-8b8a-4a80-84a2',
18: '88f8b058-8b8a-4a80-84a2',
19: '88f8b058-8b8a-4a80-84a2'},
>  'lastname': {0: 'Baco',
1: 'Baco',
2: 'Azi',
3: 'Azi',
4: 'Azi',
5: 'Azi',
6: 'Azi',
7: 'Azi',
8: 'Azi',
9: 'Azi',
10: 'Azi',
11: 'SOFTY',
12: 'SOFTY',
13: 'Dup',
14: 'Dup',
15: 'Dup',
16: 'Dup',
17: 'Dup',
18: 'Dup',
19: 'Dup'},
>  'ID_VALIDATION': {0: 82552217,
1: 82544581,
2: 82538959,
3: 82405234,
4: 82376176,
5: 82358274,
6: 82347060,
7: 82294311,
8: 82203773,
9: 82176910,
10: 82575141,
11: 82396159,
12: 82393258,
13: 82364079,
14: 82382504,
15: 82532881,
16: 82163257,
17: 82267321,
18: 82341659,
19: 82305609},
>  'DTHR_OPERATION': {0: Timestamp('2022-09-28 08:10:41'),
1: Timestamp('2022-09-28 12:06:44'),
2: Timestamp('2022-09-28 07:22:30'),
3: Timestamp('2022-09-23 07:23:13'),
4: Timestamp('2022-09-22 07:31:07'),
5: Timestamp('2022-09-21 15:38:03'),
6: Timestamp('2022-09-21 07:25:34'),
7: Timestamp('2022-09-19 17:00:03'),
8: Timestamp('2022-09-16 07:24:12'),
9: Timestamp('2022-09-15 07:21:46'),
10: Timestamp('2022-09-29 07:23:11'),
11: Timestamp('2022-09-22 16:08:38'),
12: Timestamp('2022-09-22 15:40:54'),
13: Timestamp('2022-09-22 07:03:44'),
14: Timestamp('2022-09-22 15:12:24'),
15: Timestamp('2022-09-28 07:03:53'),
16: Timestamp('2022-09-15 07:03:32'),
17: Timestamp('2022-09-19 07:03:53'),
18: Timestamp('2022-09-21 07:03:14'),
19: Timestamp('2022-09-20 07:03:47')},
>  'TYPE_OPER_VALIDATION': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 3,
12: 1,
13: 1,
14: 1,
15: 1,
16: 1,
17: 1,
18: 1,
19: 1}}

I want to add a new date column df['DTHR_OPERATION_bis]' with the closest date from DTHR_OPERATION, for a particular client (Id). To create such column, I am trying to do a junction, first by Id, then by the closest date.

I am struggling to do so as I actually want to merge by Id and then merge_asof by DTHR_OPERATION within the same junction. Is it even possible ?

Many thanks for the support :)

Here is what I tried but it doesn't work out and the code is incomplete :

df1['DTHR_OPERATION_bis'] = df1.loc[:, 'DTHR_OPERATION']
df1.head()

tol = pd.Timedelta('1 day')
df3 = pd.merge_asof(left=df1['DTHR_OPERATION'],right=df1['DTHR_OPERATION_bis'],direction='nearest',tolerance=tol)


CodePudding user response:

You can use merge_asof with the correct parameters:

df['DTHR_OPERATION'] = pd.to_datetime(df['DTHR_OPERATION'])

out = pd.merge_asof(
              # left dataframe saving original index as column
              df.reset_index().sort_values(by='DTHR_OPERATION'),
              # right: same dataframe renaming date column to _bis
              (df[['Id', 'DTHR_OPERATION']].sort_values(by='DTHR_OPERATION')
               .rename(columns={'DTHR_OPERATION': 'DTHR_OPERATION_bis'})),
              # define Id and date to join
              by='Id', left_on='DTHR_OPERATION', right_on='DTHR_OPERATION_bis',
              # we want the closest but not the same date
              direction='nearest', allow_exact_matches=False
             ).set_index('index').sort_index()

output:

                            Id lastname  ID_VALIDATION      DTHR_OPERATION  TYPE_OPER_VALIDATION  DTHR_OPERATION_bis
index                                                                                                               
0      ae9b0886-7e2b-4c37-a3a3     Baco       82552217 2022-09-28 08:10:41                     1 2022-09-28 12:06:44
1      ae9b0886-7e2b-4c37-a3a3     Baco       82544581 2022-09-28 12:06:44                     1 2022-09-28 08:10:41
2      ae290c85-9dfb-440f-becb      Azi       82538959 2022-09-28 07:22:30                     1 2022-09-29 07:23:11
3      ae290c85-9dfb-440f-becb      Azi       82405234 2022-09-23 07:23:13                     1 2022-09-22 07:31:07
4      ae290c85-9dfb-440f-becb      Azi       82376176 2022-09-22 07:31:07                     1 2022-09-21 15:38:03
5      ae290c85-9dfb-440f-becb      Azi       82358274 2022-09-21 15:38:03                     1 2022-09-21 07:25:34
6      ae290c85-9dfb-440f-becb      Azi       82347060 2022-09-21 07:25:34                     1 2022-09-21 15:38:03
7      ae290c85-9dfb-440f-becb      Azi       82294311 2022-09-19 17:00:03                     1 2022-09-21 07:25:34
8      ae290c85-9dfb-440f-becb      Azi       82203773 2022-09-16 07:24:12                     1 2022-09-15 07:21:46
9      ae290c85-9dfb-440f-becb      Azi       82176910 2022-09-15 07:21:46                     1 2022-09-16 07:24:12
10     ae290c85-9dfb-440f-becb      Azi       82575141 2022-09-29 07:23:11                     1 2022-09-28 07:22:30
11     b92faffa-89cd-48db-aafd    SOFTY       82396159 2022-09-22 16:08:38                     3 2022-09-22 15:40:54
12     b92faffa-89cd-48db-aafd    SOFTY       82393258 2022-09-22 15:40:54                     1 2022-09-22 16:08:38
13     88f8b058-8b8a-4a80-84a2      Dup       82364079 2022-09-22 07:03:44                     1 2022-09-22 15:12:24
14     88f8b058-8b8a-4a80-84a2      Dup       82382504 2022-09-22 15:12:24                     1 2022-09-22 07:03:44
15     88f8b058-8b8a-4a80-84a2      Dup       82532881 2022-09-28 07:03:53                     1 2022-09-22 15:12:24
16     88f8b058-8b8a-4a80-84a2      Dup       82163257 2022-09-15 07:03:32                     1 2022-09-19 07:03:53
17     88f8b058-8b8a-4a80-84a2      Dup       82267321 2022-09-19 07:03:53                     1 2022-09-20 07:03:47
18     88f8b058-8b8a-4a80-84a2      Dup       82341659 2022-09-21 07:03:14                     1 2022-09-20 07:03:47
19     88f8b058-8b8a-4a80-84a2      Dup       82305609 2022-09-20 07:03:47                     1 2022-09-21 07:03:14
  • Related