Home > OS >  Python/Pandas : How to self join a pandas dataframe on rows with same index
Python/Pandas : How to self join a pandas dataframe on rows with same index

Time:08-20

I have a dataframe that looks like below

merge_id identifier Location Value
1 A1 DEL 50
1 B2 HYD 60
2 C1 BEN 80
2 D2 HYD 10

I want the output dataframe to look like below

merge_id identifier Location Value m_identifier m_Location m_Value
1 A1 DEL 50 B2 HYD 60
2 C1 BEN 80 D2 HYD 10

Please can you suggest how I can do that

CodePudding user response:

This looks like a pivot with a few tweaks:

df2 = (df.assign(c=df.groupby('merge_id').cumcount())
         .pivot(index='merge_id', columns='c')
         .sort_index(level=1, sort_remaining=False, axis=1)
      )

df2.columns = df2.columns.map(lambda x: f'{"m_" if x[1] else ""}{x[0]}')

print(df2.reset_index())

output:

   merge_id identifier Location  Value m_identifier m_Location  m_Value
0         1         A1      DEL     50           B2        HYD       60
1         2         C1      BEN     80           D2        HYD       10

CodePudding user response:

here is one way about it

df2=df.merge(df.mask(df['identifier'].str.endswith('1')),
         on='merge_id',
          how='left',
        suffixes=(None,'_m'))
df2=df2.mask(df2['identifier'].eq(df2['identifier_m']))
df2.dropna()
    merge_id    identifier  Location    Value   identifier_m    Location_m  Value_m
0        1.0            A1       DEL     50.0             B2           HYD     60.0
2        2.0            C1       BEN     80.0             D2           HYD     10.0
  • Related