I have two dataframes. One data is an annexe explaining the code and the other is the mining information. I need the data connection of the second data to extract the information according to that code in the first dataframe.
Table 1:
Numero | Date | Version | Origine | Destination | Other |
---|---|---|---|---|---|
1111 | 01/11/2009 | 5 | 0065;686006;AN | 0065;74590;AT | 0065;686006;AN |
1111 | 01/11/2009 | 5 | 0065;686006;AN | 0065;74590;AT | 0065;686006;P6 |
1111 | 01/11/2009 | 5 | 0065;686006;AN | 0065;74590;AT | 0065;6866675;1P |
Table 2:
Code 1 | Code 2 | Info 1 | Info 2 | Date 1 | Date 2 |
---|---|---|---|---|---|
000654 | LO | Mcdo | London | 01/02/2002 | 01/02/2021 |
000654 | AN | Mcdo | NYC | 01/02/2002 | 01/02/2021 |
686006 | 1P | KFC | Paris | 01/02/2002 | 01/02/2021 |
686006 | AN | KFC | NYC | 01/02/2002 | 01/02/2021 |
074590 | AT | Start | LA | 01/02/2002 | 01/02/2021 |
688975 | AN | Hight | NYC | 01/02/2002 | 01/02/2021 |
In "Origine", "Destination", "Other": numero order, code 1, code 2.
So I need to join two table with conditions "Code 1", "Code 2" to give informations orders "Info 1" and "info 2".
Output:
Numero | Date | Version | Origine | Destination | Other | Info Origine | Info Destination | Info Other |
---|---|---|---|---|---|---|---|---|
1111 | 01/11/2009 | 5 | 0065;686006;AN | 0065;74590;AT | 0065;686006;AN | KFC NYC | Start LA | KFC NYC |
1111 | 01/11/2009 | 5 | 0065;686006;AN | 0065;74590;AT | 0065;686006;P6 | KFC NYC | Start LA | |
1111 | 01/11/2009 | 5 | 0065;686006;AN | 0065;74590;AT | 0065;6866675;1P | KFC NYC | Start LA |
PLEASE HELP ME! THANKS YOU SO MUCH!
CodePudding user response:
This is a long way to do this but it's clear in it's steps. Others may adapt to be more efficient. See comment within merge area - your data isn't the same format so you have to account for leading zeroes if you go with this method.
Basically, I create common columns in each table and perform multiple merges, cleaning the tables as I go.
import io
d1 = '''Numero Date Version Origine Destination Other
1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;686006;AN
1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;686006;P6
1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;6866675;1P'''
df1 = pd.read_csv(io.StringIO(d1), sep=' \s ', engine='python')
df1['OrigineCode1Code2'] = df1['Origine'].str.split(';', 1).str[1]
df1['DestinationCode1Code2'] = df1['Destination'].str.split(';', 1).str[1]
df1['OtherCode1Code2'] = df1['Other'].str.split(';', 1).str[1]
d2 = '''Code 1 Code 2 Info 1 Info 2 Date 1 Date 2
000654 LO Mcdo London 01/02/2002 01/02/2021
000654 AN Mcdo NYC 01/02/2002 01/02/2021
686006 1P KFC Paris 01/02/2002 01/02/2021
686006 AN KFC NYC 01/02/2002 01/02/2021
074590 AT Start LA 01/02/2002 01/02/2021
688975 AN Hight NYC 01/02/2002 01/02/2021'''
df2 = pd.read_csv(io.StringIO(d2), sep=' \s ', engine='python', dtype='str')
df2['Code1Code2'] = df2.apply( lambda x: ';'.join([x['Code 1'], x['Code 2']]), axis=1)
df2['Info1Info2'] = df2.apply( lambda x: ' '.join([x['Info 1'], x['Info 2']]), axis=1)
df = df1.merge(df2[['Code1Code2', 'Info1Info2']], left_on='OrigineCode1Code2', right_on='Code1Code2', how='left')
df.drop(['OrigineCode1Code2', 'Code1Code2'], axis=1, inplace=True)
# NOTE: 074590 in table 1 is different from table 2 due to leading zero; you'll have to fix that somehow as you
# won't get any matches during the merge
df = df.merge(df2[['Code1Code2', 'Info1Info2']], left_on='DestinationCode1Code2', right_on='Code1Code2', how='left')
df.drop(['DestinationCode1Code2', 'Code1Code2'], axis=1, inplace=True)
df = df.merge(df2[['Code1Code2', 'Info1Info2']], left_on='OtherCode1Code2', right_on='Code1Code2', how='left')
df.drop(['OtherCode1Code2', 'Code1Code2'], axis=1, inplace=True)
df.rename(columns={'Info1Info2_x':'Info Origine', 'Info1Info2_y':'Info Destination', 'Info1Info2':'Info Other'}, inplace=True)
df
Output
Numero Date Version Origine Destination Other Info Origine Info Destination Info Other
0 1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;686006;AN KFC NYC NaN KFC NYC
1 1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;686006;P6 KFC NYC NaN NaN
2 1111 01/11/2009 5 0065;686006;AN 0065;74590;AT 0065;6866675;1P KFC NYC NaN NaN