I am trying to match the following data:
df1(coming from big csv file):
Device Pool Directory Number 1
098 12228675309
098 12228675313
099 12228215486
100 12228843454
100 12350230910
...
with df2(smaller csv. location reference):
BID,ADDRESS,CITY
123,568 Kaput ST,Fredville
456,1542 Throwaway ave,Port
098,985 Garmin, Dadilville
459,982 Jarvis St, Harmville
099,0914 Kaplah, Klignon Peak
100,8581 Yuart,Harkplane
...
I have tried merging in various ways but I get data removed on the left, right or outer when I attempted. When the df2 data is attached, the whole contents merge without matching at all - just puts the whole list in. I have tried variations of merging, lambda, and using key values. To keep things short and to the point, this is what I am looking for between the two dataframes:
Device Pool Directory Number 1 ADDRESS CITY
098 12228675309 985 Garmin Dadilville
098 12228675313 985 Garmin Dadilville
099 12228215486 0914 Kaplah Klignon Peak
100 12228843454 8581 Yuart Harkplane
100 12350230910 8581 Yuart Harkplane
...
One of the attempts:
df_loc = pd.read_csv(path_loc)
dfk = (dfk.filter(items=['BID','ADDRESS','CITY']))
df1 = pd.read_csv(source_file, low_memory=False)
dff = (df1.filter(items=['Device Pool','Directory Number 1']))
dff = dff.loc[dff.iloc[:0].str.contains('[B][0-9][0-9][0-9]', regex=True)]
dff['Device Pool'] = dff['Device Pool'].str.replace (r'.*B(\d ).*',r'\1',regex=True)
dff = dff.merge(dfk,left_on='Device Pool','right_on='CITY',how='left')
Output:
Device Pool Directory Number 1 BID Address CITY
0 095 5989024012 NaN NaN NaN
1 095 5981523235 NaN NaN NaN
2 012 5981312311 NaN NaN NaN
3 053 5913215346 NaN NaN NaN
Been at this for awhile. Any help would be appreciated!
CodePudding user response:
There are several issues in your question.
- In
df1
it is not clear what are the column names. The separator is whitespace and some column names have several words. - You do not use merging correctly. As far as I understand, you want to join two dataframes using
Device Pool
column's values on the left andBID
on the right. The code in your case should look in the following way:
df1 = pd.DataFrame([
('098', '12228675309'),
('098', '12228675313'),
('099', '12228215486'),
('100', '12228843454'),
('100', '12350230910')
],
columns = ['Device Pool', 'Directory Number 1'],
)
df2 = pd.DataFrame([
('123', '568 Kaput ST', 'Fredville'),
('456', '1542 Throwaway ave', 'Port'),
('098', '985 Garmin', 'Dadilville'),
('459', '982 Jarvis St', 'Harmville'),
('099', '0914 Kaplah', 'Klignon Peak'),
('100', '8581 Yuart', 'Harkplane'),
],
columns=['BID','ADDRESS','CITY']
)
dff = pd.merge(left=df1, right=df2, left_on='Device Pool', right_on='BID')
The dff
will be the following:
Device Pool Directory Number 1 BID ADDRESS CITY
0 098 12228675309 098 985 Garmin Dadilville
1 098 12228675313 098 985 Garmin Dadilville
2 099 12228215486 099 0914 Kaplah Klignon Peak
3 100 12228843454 100 8581 Yuart Harkplane
4 100 12350230910 100 8581 Yuart Harkplane