I have two dataframes- df1 having columns like ISIN, Name, Weight and df2 having columns like Short Name, ISIN.
df1 =
ISIN Name Weight
Enbridge Inc 0.1
UDR Inc 1.1
Tyson Foods Inc 1.9
and df2=
Short Name ISIN
Enbridge Inc. bvefj154
UDR Group iuhb38g7
Tyson Foods Pvt Ltd. hruidf12
I have developed a fuzzy logic which will match Name
and Short Name
from df1 and df2.
so it using the logic it will know that Enbridge Inc from both dataframes are one and the same. and for UDR Group and UDR Inc they are also same since names are matching, not all but almost all.
I am looking for a way to populate the ISIN column
in df1 based on the logic that if the names match(Enbridge Inc matches in all) then select the ISIN for the respective Short Name
from df2 and add it to ISIN column in df1 wherever relevant name is present.
So the output that I look forward to would look like this: df1=
ISIN Name Weight
bvefj154 Enbridge Inc 0.1
iuhb38g7 UDR Inc 1.1
hruidf12 Tyson Foods Inc 1.9
using pandas's merge
function I tried achieving the task but got a an error like:
KeywordError:'Name'
here's the code for the same.
import pandas as pd
df1 = pd.merge(df1, df2, on=['Name', '% Weight'], how='right')
How can I do this? Please help.
EDIT: Here is the code for fuzzy logic matching using fuzzywuzzy
module
def fuzzy_merge(df_1, df_2, key1, key2, threshold=90, limit=1):
"""
:param df_1: the left table to join
:param df_2: the right table to join
:param key1: key column of the left table
:param key2: key column of the right table
:param threshold: how close the matches should be to return a match, based on Levenshtein distance
:param limit: the amount of matches that will get returned, these are sorted high to low
:return: dataframe with boths keys and matches
"""
s = df_2[key2].tolist()
m = df_1[key1].apply(lambda x: process.extract(x, s, limit=limit))
df_1['matches'] = m
m2 = df_1['matches'].apply(lambda x: ', '.join([i[0] for i in x if i[1] >= threshold]))
df_1['matches'] = m2
print(df_1)
df_1.to_csv('fuzzy-1390-match.csv')
#return df_1
fuzzy_merge(df1, df2, 'Name', 'Short Name', threshold=90)
Output:
5,Enbridge Inc Flt 07/15/80 Sr:20-A,0.0127,ENBRIDGE INC
6,Enbridge Inc. 6.25% 03/01/78,0.0122,ENBRIDGE INC
7,Emera 6.75% 6/15/76-26,0.0113,MERA
8,Scentre Group Trust 2 Flt 09/24/80 Sr:144A,0.011,SCENTRE GROUP
9,Credit Suisse Group AG 7.5 Perp,0.0106,
10,Aegon Funding Corp Ii 5.100% 12/15/49,0.0101,
11,Dte Energy Co 5.250% 12/01/77 Sr:E,0.01,DTE ENERGY CO
12,Dai-Ichi Life Insurance 4%,0.0099,
13,Southern Co Flt 09/15/51 Sr:21-A,0.0098,SOUTHERN CO
EDIT2:
This is the dataframe(df1 and df2): df1=
0 Transcanada Trust 5.875 08/15/76 0.0176
1 Bp Capital Markets Plc Flt Perp 0.0169
2 Transcanada Trust Flt 09/15/79 0.0169
3 Bp Capital Markets Plc Flt Perp 0.0155
4 Prudential Financial 5.375% 5/15/45 0.0150
5 Enbridge Inc Flt 07/15/80 Sr:20-A 0.0127
6 Enbridge Inc. 6.25% 03/01/78 0.0122
7 Emera 6.75% 6/15/76-26 0.0113
8 Scentre Group Trust 2 Flt 09/24/80 Sr:144A 0.0110
9 Credit Suisse Group AG 7.5 Perp 0.0106
10 Aegon Funding Corp Ii 5.100% 12/15/49 0.0101
11 Dte Energy Co 5.250% 12/01/77 Sr:E 0.0100
12 Dai-Ichi Life Insurance 4% 0.0099
13 Southern Co Flt 09/15/51 Sr:21-A 0.0098
14 Prudential Financial 5.625% 6/15/43 0.0097
15 Southern Co 4.950% 01/30/80 Sr:2020 0.0093
16 Scentre Group Trust 2 Flt 09/24/80 Sr:144A 0.0093
17 Metlife Inc 9.25% 4/8/2038 144A 0.0089
18 American Intl Group 8.175% 5/15/58 0.0086
19 Southern Co Flt 01/15/51 Sr:B 0.0079
df2=
Short Name ISIN
0 ABU DHABI COMMER AEA000201011
1 ABU DHABI NATION AEA002401015
2 ABU DHABI NATION AEA006101017
3 ADNOC DRILLING C AEA007301012
4 ALPHA DHABI HOLD AEA007601015
5 DUBAI ISLAMIC AED000201015
6 EMAAR PROP PJSC AEE000301011
7 ETISALAT AEE000401019
8 EMIRATES NBD PJS AEE000801010
9 INTL HOLDING CO AEI000201014
10 FIRST ABU DHABI AEN000101016
11 SCHLUMBERGER LTD AN8068571086
12 ERSTE GROUP BANK AT0000652011
13 OMV AG AT0000743059
14 VERBUND AG AT0000746409
15 ARISTOCRAT LEISU AU000000ALL7
16 AUST AND NZ BANK AU000000ANZ3
17 AFTERPAY LTD AU000000APT1
18 ASX LTD AU000000ASX7
19 BHP GROUP LTD AU000000BHP4
CodePudding user response:
You can use split to get the result. I search if the first word in df1.Name is in df2.Short Name
import pandas as pd
df1 = pd.DataFrame({'ISIN': ['', '', ''], 'Name': ['Enbridge Inc', 'UDR Inc', 'Tyson Foods Inc'], 'Weight': ['0.1', '1.1', '1.9']})
df2 = pd.DataFrame({'Short Name': ['Enbridge Inc.', 'UDR Group', 'Tyson Foods Pvt Ltd.'], 'ISIN': ['bvefj154', 'iuhb38g7', 'hruidf12']})
def strMergeData(strColumnDf1):
strColumnDf1 = strColumnDf1.split()[0]
for strColumnDf2 in df2['Short Name']:
if strColumnDf1 in strColumnDf2:
return df2[df2['Short Name'] == strColumnDf2]['ISIN'].values[0]
break
else:
pass
df1['ISIN'] = df1.apply(lambda x: strMergeData(x['Name']),axis=1)
print(df1)
Output :
ISIN Name Weight
0 bvefj154 Enbridge Inc 0.1
1 iuhb38g7 UDR Inc 1.1
2 hruidf12 Tyson Foods Inc 1.9
You find below the test of your example in the provided code. I just add TRANSCANADA in ALPHA DHABI HOLD to match some example.
import pandas as pd
df1 = pd.DataFrame({'ISIN': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
'Name': ['Transcanada Trust 5.875 08/15/76',
'Bp Capital Markets Plc Flt Perp',
'Transcanada Trust Flt 09/15/79',
'Bp Capital Markets Plc Flt Perp',
'Prudential Financial 5.375% 5/15/45',
'Enbridge Inc Flt 07/15/80 Sr:20-A',
'Enbridge Inc. 6.25% 03/01/78',
'Emera 6.75% 6/15/76-26',
'Scentre Group Trust 2 Flt 09/24/80 Sr:144A',
'Credit Suisse Group AG 7.5 Perp',
'Aegon Funding Corp Ii 5.100% 12/15/49',
'Dte Energy Co 5.250% 12/01/77 Sr:E',
'Dai-Ichi Life Insurance 4%',
'Southern Co Flt 09/15/51 Sr:21-A',
'Prudential Financial 5.625% 6/15/43',
'Southern Co 4.950% 01/30/80 Sr:2020',
'Scentre Group Trust 2 Flt 09/24/80 Sr:144A',
'Metlife Inc 9.25% 4/8/2038 144A',
'American Intl Group 8.175% 5/15/58',
'Southern Co Flt 01/15/51 Sr:B',
19.5],
'Weight': [0.0176, 0.0169, 0.0169, 0.0155,0.0150,0.0127,0.0122,0.0113,0.0110,0.0106,0.0101,0.0100
,0.0099,0.0098,0.0097,0.0093,0.0093,0.0089,0.0086,0.0079,0.0091]})
df2 = pd.DataFrame({'Short Name': ['ABU DHABI COMMER', 'ABU DHABI NATION', 'ABU DHABI NATION',
'ADNOC DRILLING C','TRANSCANADA ALPHA DHABI HOLD','DUBAI ISLAMIC' ,
'EMAAR PROP PJSC','ETISALAT','EMIRATES NBD PJS','INTL HOLDING CO' ,
'FIRST ABU DHABI' ,'SCHLUMBERGER LTD' ,'ERSTE GROUP BANK' ,'OMV AG',
'VERBUND AG', 'ARISTOCRAT LEISU', 'AUST AND NZ BANK', 'AFTERPAY LTD',
'ASX LTD', 'BHP GROUP LTD',19.5],
'ISIN': [ 'AEA000201011','AEA002401015','AEA006101017','AEA007301012','AEA007601015',
'AED000201015','AEE000301011','AEE000401019','AEE000801010','AEI000201014',
'AEN000101016','AN8068571086','AT0000652011','AT0000743059','AT0000746409',
'AU000000ALL7','AU000000ANZ3','AU000000APT1','AU000000ASX7','AU000000BHP4','FLOAT_TEST'] })
def strMergeData(strColumnDf1):
strColumnDf1 = str(strColumnDf1).split()[0]
for strColumnDf2 in df2['Short Name']:
if str(strColumnDf1).upper() in str(strColumnDf2).upper():
return df2[df2['Short Name'] == strColumnDf2]['ISIN'].values[0]
break
else:
pass
df1['ISIN'] = df1.apply(lambda x: strMergeData(x['Name']),axis=1)
print(df1)
Output :
ISIN Name Weight
0 AEA007601015 Transcanada Trust 5.875 08/15/76 0.0176
1 None Bp Capital Markets Plc Flt Perp 0.0169
2 AEA007601015 Transcanada Trust Flt 09/15/79 0.0169
3 None Bp Capital Markets Plc Flt Perp 0.0155
4 None Prudential Financial 5.375% 5/15/45 0.0150
5 None Enbridge Inc Flt 07/15/80 Sr:20-A 0.0127
6 None Enbridge Inc. 6.25% 03/01/78 0.0122
7 None Emera 6.75% 6/15/76-26 0.0113
8 None Scentre Group Trust 2 Flt 09/24/80 Sr:144A 0.0110
9 None Credit Suisse Group AG 7.5 Perp 0.0106
10 None Aegon Funding Corp Ii 5.100% 12/15/49 0.0101
11 None Dte Energy Co 5.250% 12/01/77 Sr:E 0.0100
12 None Dai-Ichi Life Insurance 4% 0.0099
13 None Southern Co Flt 09/15/51 Sr:21-A 0.0098
14 None Prudential Financial 5.625% 6/15/43 0.0097
15 None Southern Co 4.950% 01/30/80 Sr:2020 0.0093
16 None Scentre Group Trust 2 Flt 09/24/80 Sr:144A 0.0093
17 None Metlife Inc 9.25% 4/8/2038 144A 0.0089
18 None American Intl Group 8.175% 5/15/58 0.0086
19 None Southern Co Flt 01/15/51 Sr:B 0.0079
20 FLOAT_TEST 19.5 0.0091