Home > front end >  Join two data frames, with one column in a dataframe having multivalue data
Join two data frames, with one column in a dataframe having multivalue data

Time:01-15

I have two csv files as :

    Name    ID
0   Jack    1|2|3
1   Mac 4|5
2   Turtle  6|8
3   Rosh    9||10

    Id  Address
0   1   Adr1
1   2   Adr2
2   3   Adr3
3   4   Adr4
4   5   Adr5
5   6   Adr6
6   7   Adr7
7   8   Adr8
8   9   Adr9
9   10  Adr10

How do I join both of them based on ID value using dataframe and get the output as below:

    Name    ID
0   Jack    Adr1|Adr2|Adr3
1   Mac Adr4|Adr5
2   Turtle  Adr6|Adr8
3   Rosh    Adr9||Adr10

Solution I am trying is to read both files separately using pandas.read_csv and then for the first dataframe iterate the rows:

for i,j in df_first_file.iterrows(): x = j['ID'].split('|') for val in x : print(val)

But after that I am struggling to join it with the other dataframe as its now a string after iterating it through the rows

CodePudding user response:

Use:

#create DataFrames
df1 = pd.read_csv(file1)
df2 = pd.read_csv(file2)

#created dicts
df2['Id'] = df2['Id'].astype(str)
d = df2.set_index('Id')['Address'].to_dict()

#mapping with default empty string splitted values and last join by |
df1['ID'] = df1['ID'].apply(lambda x: '|'.join(d.get(y, '') for y in x.split('|')))

CodePudding user response:

You could split and explode the ID column to be able to merge both dataframes:

df = df1.assign(ID = df1['ID'].str.split('|')).explode('ID').merge(
    df2.astype(str), left_on='ID', right_on='Id', how='left').fillna('')
df = df[['Name', 'Address']].groupby('Name').agg(list).reset_index()
df['Address'] = df['Address'].transform('|'.join)

It gives as expected:

     Name         Address
0    Jack  Adr1|Adr2|Adr3
1     Mac       Adr4|Adr5
2    Rosh     Adr9||Adr10
3  Turtle       Adr6|Adr8
  •  Tags:  
  • Related