I have two csv files as :
Name ID
0 Jack 1|2|3
1 Mac 4|5
2 Turtle 6|8
3 Rosh 9||10
Id Address
0 1 Adr1
1 2 Adr2
2 3 Adr3
3 4 Adr4
4 5 Adr5
5 6 Adr6
6 7 Adr7
7 8 Adr8
8 9 Adr9
9 10 Adr10
How do I join both of them based on ID value using dataframe and get the output as below:
Name ID
0 Jack Adr1|Adr2|Adr3
1 Mac Adr4|Adr5
2 Turtle Adr6|Adr8
3 Rosh Adr9||Adr10
Solution I am trying is to read both files separately using pandas.read_csv and then for the first dataframe iterate the rows:
for i,j in df_first_file.iterrows(): x = j['ID'].split('|') for val in x : print(val)
But after that I am struggling to join it with the other dataframe as its now a string after iterating it through the rows
CodePudding user response:
Use:
#create DataFrames
df1 = pd.read_csv(file1)
df2 = pd.read_csv(file2)
#created dicts
df2['Id'] = df2['Id'].astype(str)
d = df2.set_index('Id')['Address'].to_dict()
#mapping with default empty string splitted values and last join by |
df1['ID'] = df1['ID'].apply(lambda x: '|'.join(d.get(y, '') for y in x.split('|')))
CodePudding user response:
You could split and explode the ID column to be able to merge both dataframes:
df = df1.assign(ID = df1['ID'].str.split('|')).explode('ID').merge(
df2.astype(str), left_on='ID', right_on='Id', how='left').fillna('')
df = df[['Name', 'Address']].groupby('Name').agg(list).reset_index()
df['Address'] = df['Address'].transform('|'.join)
It gives as expected:
Name Address
0 Jack Adr1|Adr2|Adr3
1 Mac Adr4|Adr5
2 Rosh Adr9||Adr10
3 Turtle Adr6|Adr8