Home > Net >  Python Dataframe Duplicated Columns while Merging multple times
Python Dataframe Duplicated Columns while Merging multple times

Time:04-29

I have a main dataframe and a sub dataframe. I want to merge each column in sub dataframe into main dataframe with main dataframe column as a reference. I have successfully arrived at my desired answer, except that I see duplicated columns of the main dataframe. Below are the my expected and present answers.

Present solution:

df = pd.DataFrame({'Ref':[1,2,3,4]})
df1 = pd.DataFrame({'A':[2,3],'Z':[1,2]})

df = [df.merge(df1[col_name],left_on='Ref',right_on=col_name,how='left') for col_name in df1.columns]
df = pd.concat(df,axis=1)
df = 
   Ref    A   Ref    Z
0    1  NaN     1  1.0
1    2  2.0     2  2.0
2    3  3.0     3  NaN
3    4  NaN     4  NaN

Expected Answer:

df = 
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

CodePudding user response:

What about setting 'Ref' col as index while getting dataframe list. (And resetting index such that you get back Ref as a column)

df = pd.DataFrame({'Ref':[1,2,3,4]}) 
df1 = pd.DataFrame({'A':[2,3],'Z':[1,2]})


df = [df.merge(df1[col_name],left_on='Ref',right_on=col_name,how='left').set_index('Ref') for col_name in df1.columns]
df = pd.concat(df,axis=1)

df = df.reset_index()

        Ref  A   Z
        1   NaN 1.0
        2   2.0 2.0 
        3   3.0 NaN 
        4   NaN NaN

CodePudding user response:

This is a reduction process. Instead of the list comprehension use for - loop, or even reduce:

from functools import reduce

reduce(lambda x, y : x.merge(df1[y],left_on='Ref',right_on=y,how='left'), df1.columns, df)

   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

The above is similar to:

for y in df1.columns:
    df = df.merge(df1[y],left_on='Ref',right_on=y,how='left')
df
 
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN
   

CodePudding user response:

Update

Use duplicated:

>>> df.loc[:, ~df.columns.duplicated()]
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

Old answer

You can use:

# Your code
...
df = pd.concat(df, axis=1)

# Use pop and insert to cleanup your dataframe
df.insert(0, 'Ref', df.pop('Ref').iloc[:, 0])

Output:

>>> df
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN
  • Related