Home > Enterprise >  Python: joining two dataframe with multiple conditions
Python: joining two dataframe with multiple conditions

Time:10-04

I have two dataframes. One data is an annexe explaining the code and the other is the mining information. I need the data connection of the second data to extract the information according to that code in the first dataframe.

Table 1:

Numero Date Version Origine Destination Other
1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;686006;AN
1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;686006;P6
1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;6866675;1P

Table 2:

Code 1 Code 2 Info 1 Info 2 Date 1 Date 2
000654 LO Mcdo London 01/02/2002 01/02/2021
000654 AN Mcdo NYC 01/02/2002 01/02/2021
686006 1P KFC Paris 01/02/2002 01/02/2021
686006 AN KFC NYC 01/02/2002 01/02/2021
074590 AT Start LA 01/02/2002 01/02/2021
688975 AN Hight NYC 01/02/2002 01/02/2021

In "Origine", "Destination", "Other": numero order, code 1, code 2.

So I need to join two table with conditions "Code 1", "Code 2" to give informations orders "Info 1" and "info 2".

Output:

Numero Date Version Origine Destination Other Info Origine Info Destination Info Other
1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;686006;AN KFC NYC Start LA KFC NYC
1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;686006;P6 KFC NYC Start LA
1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;6866675;1P KFC NYC Start LA

PLEASE HELP ME! THANKS YOU SO MUCH!

CodePudding user response:

This is a long way to do this but it's clear in it's steps. Others may adapt to be more efficient. See comment within merge area - your data isn't the same format so you have to account for leading zeroes if you go with this method.

Basically, I create common columns in each table and perform multiple merges, cleaning the tables as I go.

import io

d1 = '''Numero  Date    Version Origine Destination Other
1111    01/11/2009  5   0065;686006;AN  0065;74590;AT   0065;686006;AN
1111    01/11/2009  5   0065;686006;AN  0065;74590;AT   0065;686006;P6
1111    01/11/2009  5   0065;686006;AN  0065;74590;AT   0065;6866675;1P'''

df1 = pd.read_csv(io.StringIO(d1), sep=' \s ', engine='python')
df1['OrigineCode1Code2'] = df1['Origine'].str.split(';', 1).str[1]
df1['DestinationCode1Code2'] = df1['Destination'].str.split(';', 1).str[1]
df1['OtherCode1Code2'] = df1['Other'].str.split(';', 1).str[1]


d2 = '''Code 1  Code 2  Info 1  Info 2  Date 1  Date 2
000654  LO  Mcdo    London  01/02/2002  01/02/2021
000654  AN  Mcdo    NYC 01/02/2002  01/02/2021
686006  1P  KFC Paris   01/02/2002  01/02/2021
686006  AN  KFC NYC 01/02/2002  01/02/2021
074590  AT  Start   LA  01/02/2002  01/02/2021
688975  AN  Hight   NYC 01/02/2002  01/02/2021'''

df2 = pd.read_csv(io.StringIO(d2), sep=' \s ', engine='python', dtype='str')
df2['Code1Code2'] = df2.apply( lambda x: ';'.join([x['Code 1'], x['Code 2']]), axis=1)
df2['Info1Info2'] = df2.apply( lambda x: ' '.join([x['Info 1'], x['Info 2']]), axis=1)

df = df1.merge(df2[['Code1Code2', 'Info1Info2']], left_on='OrigineCode1Code2', right_on='Code1Code2', how='left')
df.drop(['OrigineCode1Code2', 'Code1Code2'], axis=1, inplace=True)

# NOTE: 074590 in table 1 is different from table 2 due to leading zero; you'll have to fix that somehow as you
# won't get any matches during the merge
df = df.merge(df2[['Code1Code2', 'Info1Info2']], left_on='DestinationCode1Code2', right_on='Code1Code2', how='left')
df.drop(['DestinationCode1Code2', 'Code1Code2'], axis=1, inplace=True)

df = df.merge(df2[['Code1Code2', 'Info1Info2']], left_on='OtherCode1Code2', right_on='Code1Code2', how='left')
df.drop(['OtherCode1Code2', 'Code1Code2'], axis=1, inplace=True)

df.rename(columns={'Info1Info2_x':'Info Origine', 'Info1Info2_y':'Info Destination', 'Info1Info2':'Info Other'}, inplace=True)
df

Output

   Numero        Date  Version         Origine    Destination            Other Info Origine Info Destination Info Other
0    1111  01/11/2009        5  0065;686006;AN  0065;74590;AT   0065;686006;AN      KFC NYC              NaN    KFC NYC
1    1111  01/11/2009        5  0065;686006;AN  0065;74590;AT   0065;686006;P6      KFC NYC              NaN        NaN
2    1111  01/11/2009        5  0065;686006;AN  0065;74590;AT  0065;6866675;1P      KFC NYC              NaN        NaN
  • Related