Home > Net >  Conditional merging of two dataframes using pandas
Conditional merging of two dataframes using pandas

Time:12-14

I have two dataframes- df1 having columns like ISIN, Name, Weight and df2 having columns like Short Name, ISIN.

df1 =

ISIN    Name            Weight
        Enbridge Inc    0.1
        UDR Inc         1.1
        Tyson Foods Inc 1.9

and df2=

Short Name            ISIN
Enbridge Inc.         bvefj154
UDR Group             iuhb38g7
Tyson Foods Pvt Ltd.  hruidf12

I have developed a fuzzy logic which will match Name and Short Name from df1 and df2. so it using the logic it will know that Enbridge Inc from both dataframes are one and the same. and for UDR Group and UDR Inc they are also same since names are matching, not all but almost all.

I am looking for a way to populate the ISIN column in df1 based on the logic that if the names match(Enbridge Inc matches in all) then select the ISIN for the respective Short Name from df2 and add it to ISIN column in df1 wherever relevant name is present.

So the output that I look forward to would look like this: df1=

ISIN            Name            Weight
bvefj154        Enbridge Inc    0.1
iuhb38g7        UDR Inc         1.1
hruidf12        Tyson Foods Inc 1.9

using pandas's merge function I tried achieving the task but got a an error like:

KeywordError:'Name'

here's the code for the same.

import pandas as pd
df1 = pd.merge(df1, df2, on=['Name', '% Weight'], how='right')

How can I do this? Please help.

EDIT: Here is the code for fuzzy logic matching using fuzzywuzzy module

def fuzzy_merge(df_1, df_2, key1, key2, threshold=90, limit=1):
    """
    :param df_1: the left table to join
    :param df_2: the right table to join
    :param key1: key column of the left table
    :param key2: key column of the right table
    :param threshold: how close the matches should be to return a match, based on Levenshtein distance
    :param limit: the amount of matches that will get returned, these are sorted high to low
    :return: dataframe with boths keys and matches
    """
    s = df_2[key2].tolist()

    m = df_1[key1].apply(lambda x: process.extract(x, s, limit=limit))
    df_1['matches'] = m

    m2 = df_1['matches'].apply(lambda x: ', '.join([i[0] for i in x if i[1] >= threshold]))
    df_1['matches'] = m2

    print(df_1)
    df_1.to_csv('fuzzy-1390-match.csv')
    #return df_1


fuzzy_merge(df1, df2, 'Name', 'Short Name', threshold=90)

Output:

5,Enbridge Inc Flt 07/15/80 Sr:20-A,0.0127,ENBRIDGE INC
6,Enbridge Inc. 6.25% 03/01/78,0.0122,ENBRIDGE INC
7,Emera 6.75% 6/15/76-26,0.0113,MERA
8,Scentre Group Trust 2 Flt 09/24/80 Sr:144A,0.011,SCENTRE GROUP
9,Credit Suisse Group AG 7.5 Perp,0.0106,
10,Aegon Funding Corp Ii 5.100% 12/15/49,0.0101,
11,Dte Energy Co 5.250% 12/01/77 Sr:E,0.01,DTE ENERGY CO
12,Dai-Ichi Life Insurance 4%,0.0099,
13,Southern Co Flt 09/15/51 Sr:21-A,0.0098,SOUTHERN CO

EDIT2:

This is the dataframe(df1 and df2): df1=

0             Transcanada Trust 5.875 08/15/76    0.0176
1              Bp Capital Markets Plc Flt Perp    0.0169
2               Transcanada Trust Flt 09/15/79    0.0169
3              Bp Capital Markets Plc Flt Perp    0.0155
4          Prudential Financial 5.375% 5/15/45    0.0150
5            Enbridge Inc Flt 07/15/80 Sr:20-A    0.0127
6                 Enbridge Inc. 6.25% 03/01/78    0.0122
7                       Emera 6.75% 6/15/76-26    0.0113
8   Scentre Group Trust 2 Flt 09/24/80 Sr:144A    0.0110
9              Credit Suisse Group AG 7.5 Perp    0.0106
10       Aegon Funding Corp Ii 5.100% 12/15/49    0.0101
11          Dte Energy Co 5.250% 12/01/77 Sr:E    0.0100
12                  Dai-Ichi Life Insurance 4%    0.0099
13            Southern Co Flt 09/15/51 Sr:21-A    0.0098
14         Prudential Financial 5.625% 6/15/43    0.0097
15         Southern Co 4.950% 01/30/80 Sr:2020    0.0093
16  Scentre Group Trust 2 Flt 09/24/80 Sr:144A    0.0093
17             Metlife Inc 9.25% 4/8/2038 144A    0.0089
18          American Intl Group 8.175% 5/15/58    0.0086
19               Southern Co Flt 01/15/51 Sr:B    0.0079

df2=

          Short Name          ISIN
0   ABU DHABI COMMER  AEA000201011
1   ABU DHABI NATION  AEA002401015
2   ABU DHABI NATION  AEA006101017
3   ADNOC DRILLING C  AEA007301012
4   ALPHA DHABI HOLD  AEA007601015
5      DUBAI ISLAMIC  AED000201015
6    EMAAR PROP PJSC  AEE000301011
7           ETISALAT  AEE000401019
8   EMIRATES NBD PJS  AEE000801010
9    INTL HOLDING CO  AEI000201014
10   FIRST ABU DHABI  AEN000101016
11  SCHLUMBERGER LTD  AN8068571086
12  ERSTE GROUP BANK  AT0000652011
13            OMV AG  AT0000743059
14        VERBUND AG  AT0000746409
15  ARISTOCRAT LEISU  AU000000ALL7
16  AUST AND NZ BANK  AU000000ANZ3
17      AFTERPAY LTD  AU000000APT1
18           ASX LTD  AU000000ASX7
19     BHP GROUP LTD  AU000000BHP4

CodePudding user response:

You can use split to get the result. I search if the first word in df1.Name is in df2.Short Name

import pandas as pd

df1 = pd.DataFrame({'ISIN': ['', '', ''], 'Name': ['Enbridge Inc', 'UDR Inc', 'Tyson Foods Inc'], 'Weight': ['0.1', '1.1', '1.9']})
df2 = pd.DataFrame({'Short Name': ['Enbridge Inc.', 'UDR Group', 'Tyson Foods Pvt Ltd.'], 'ISIN': ['bvefj154', 'iuhb38g7', 'hruidf12']})

def strMergeData(strColumnDf1):
    strColumnDf1 = strColumnDf1.split()[0]
    for strColumnDf2 in df2['Short Name']:
        if strColumnDf1 in strColumnDf2:
            return df2[df2['Short Name'] == strColumnDf2]['ISIN'].values[0]
            break
        else:
            pass
        
df1['ISIN'] = df1.apply(lambda x: strMergeData(x['Name']),axis=1)
print(df1)

Output :

       ISIN             Name Weight
0  bvefj154     Enbridge Inc    0.1
1  iuhb38g7          UDR Inc    1.1
2  hruidf12  Tyson Foods Inc    1.9

Demo


You find below the test of your example in the provided code. I just add TRANSCANADA in ALPHA DHABI HOLD to match some example.

import pandas as pd

df1 = pd.DataFrame({'ISIN': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
                    'Name': ['Transcanada Trust 5.875 08/15/76',
                             'Bp Capital Markets Plc Flt Perp',
                             'Transcanada Trust Flt 09/15/79',
                             'Bp Capital Markets Plc Flt Perp',
                             'Prudential Financial 5.375% 5/15/45',
                             'Enbridge Inc Flt 07/15/80 Sr:20-A',
                             'Enbridge Inc. 6.25% 03/01/78',
                             'Emera 6.75% 6/15/76-26',
                             'Scentre Group Trust 2 Flt 09/24/80 Sr:144A',
                             'Credit Suisse Group AG 7.5 Perp',
                             'Aegon Funding Corp Ii 5.100% 12/15/49',
                             'Dte Energy Co 5.250% 12/01/77 Sr:E',
                             'Dai-Ichi Life Insurance 4%',
                             'Southern Co Flt 09/15/51 Sr:21-A',
                             'Prudential Financial 5.625% 6/15/43',
                             'Southern Co 4.950% 01/30/80 Sr:2020',
                             'Scentre Group Trust 2 Flt 09/24/80 Sr:144A',
                             'Metlife Inc 9.25% 4/8/2038 144A',
                             'American Intl Group 8.175% 5/15/58',
                             'Southern Co Flt 01/15/51 Sr:B',
                             19.5],
                    'Weight': [0.0176, 0.0169, 0.0169, 0.0155,0.0150,0.0127,0.0122,0.0113,0.0110,0.0106,0.0101,0.0100
                               ,0.0099,0.0098,0.0097,0.0093,0.0093,0.0089,0.0086,0.0079,0.0091]})

df2 = pd.DataFrame({'Short Name': ['ABU DHABI COMMER', 'ABU DHABI NATION', 'ABU DHABI NATION',
                                   'ADNOC DRILLING C','TRANSCANADA ALPHA DHABI HOLD','DUBAI ISLAMIC' ,
                                   'EMAAR PROP PJSC','ETISALAT','EMIRATES NBD PJS','INTL HOLDING CO' ,
                                   'FIRST ABU DHABI'  ,'SCHLUMBERGER LTD'  ,'ERSTE GROUP BANK'  ,'OMV AG',
                                   'VERBUND AG',  'ARISTOCRAT LEISU',  'AUST AND NZ BANK',  'AFTERPAY LTD',
                                   'ASX LTD',  'BHP GROUP LTD',19.5],
                    'ISIN': [ 'AEA000201011','AEA002401015','AEA006101017','AEA007301012','AEA007601015',
                              'AED000201015','AEE000301011','AEE000401019','AEE000801010','AEI000201014',
                              'AEN000101016','AN8068571086','AT0000652011','AT0000743059','AT0000746409',
                              'AU000000ALL7','AU000000ANZ3','AU000000APT1','AU000000ASX7','AU000000BHP4','FLOAT_TEST'] })

def strMergeData(strColumnDf1):
    strColumnDf1 = str(strColumnDf1).split()[0]
    for strColumnDf2 in df2['Short Name']:
        if str(strColumnDf1).upper() in str(strColumnDf2).upper():
            return df2[df2['Short Name'] == strColumnDf2]['ISIN'].values[0]
            break
        else:
            pass
        
df1['ISIN'] = df1.apply(lambda x: strMergeData(x['Name']),axis=1)
print(df1)

Output :

            ISIN                                        Name  Weight
0   AEA007601015            Transcanada Trust 5.875 08/15/76  0.0176
1           None             Bp Capital Markets Plc Flt Perp  0.0169
2   AEA007601015              Transcanada Trust Flt 09/15/79  0.0169
3           None             Bp Capital Markets Plc Flt Perp  0.0155
4           None         Prudential Financial 5.375% 5/15/45  0.0150
5           None           Enbridge Inc Flt 07/15/80 Sr:20-A  0.0127
6           None                Enbridge Inc. 6.25% 03/01/78  0.0122
7           None                      Emera 6.75% 6/15/76-26  0.0113
8           None  Scentre Group Trust 2 Flt 09/24/80 Sr:144A  0.0110
9           None             Credit Suisse Group AG 7.5 Perp  0.0106
10          None       Aegon Funding Corp Ii 5.100% 12/15/49  0.0101
11          None          Dte Energy Co 5.250% 12/01/77 Sr:E  0.0100
12          None                  Dai-Ichi Life Insurance 4%  0.0099
13          None            Southern Co Flt 09/15/51 Sr:21-A  0.0098
14          None         Prudential Financial 5.625% 6/15/43  0.0097
15          None         Southern Co 4.950% 01/30/80 Sr:2020  0.0093
16          None  Scentre Group Trust 2 Flt 09/24/80 Sr:144A  0.0093
17          None             Metlife Inc 9.25% 4/8/2038 144A  0.0089
18          None          American Intl Group 8.175% 5/15/58  0.0086
19          None               Southern Co Flt 01/15/51 Sr:B  0.0079
20    FLOAT_TEST                                        19.5  0.0091

Demo

  • Related