Home > front end >  How to append two rows from dataframe in a single row by groupby on a column using Python
How to append two rows from dataframe in a single row by groupby on a column using Python

Time:10-18

I have dataframe like below:

df = pd.DataFrame.from_dict(
    {
        'vid':['v1','v1','v1','v1','v1','v2','v2','v2','v2'],
        'Name': ['Jenny', 'Matt', 'Jenny', 'Jenny',  'Matt', 'Jenny',  'Matt', 'Jenny', 'Matt'],
        'Year': [2020, 2021,2022, 2022, 2020, 2021, 2022, 2020, 2021],
        'Income': [10000, 11000,90001, 9000, 12000, 13000, 11000, 14000, 15000],
        'Gender': ['F', 'M', 'F', 'M','F', 'M', 'F', 'M', 'F']
    }
)
print(df)

output:

 vid   Name  Year  Income Gender
0  v1  Jenny  2020   10000      F
1  v1   Matt  2021   11000      M
2  v1  Jenny  2022   90001      F
3  v1  Jenny  2022    9000      M
4  v1   Matt  2020   12000      F
5  v2  Jenny  2021   13000      M
6  v2   Matt  2022   11000      F
7  v2  Jenny  2020   14000      M
8  v2   Matt  2021   15000      F

i want to append two rows into single row by using Gender as well as need to groupby with vid. i dont bother on increasing the columns

My expected output should be:

 vid   Name  Year  Income Gender   vid_1  Name_1 Year_1 Income_1 Gender_1
  v1  Jenny  2020   10000      F      v1   Matt  2021   11000      M
  v1  Jenny  2022   90001      F      v1  Jenny  2022    9000      M
  v1   Matt  2020   12000      F      NA   NA     NA     NA        NA
  v2  Jenny  2021   13000      M      v2   Matt  2022   11000      F
  v2  Jenny  2020   14000      M      v2   Matt  2021   15000      F

Any suggestions how to do this would be helpful.

CodePudding user response:

Use GroupBy.cumcount for counter per vid, create Multiindex and reshape by DataFrame.unstack, sorting columns and flatten MultiIndex:

a = df.groupby('vid').cumcount()
df = (df.set_index([df['vid'], a // 2, a % 2])
        .unstack()
        .sort_index(level=1, axis=1, sort_remaining=False)
        .reset_index(drop=True))

df.columns = [a if b == 0 else f'{a}_{b}' for a, b in df.columns]

print (df)
  vid   Name    Year   Income Gender vid_1 Name_1  Year_1  Income_1 Gender_1
0  v1  Jenny  2020.0  10000.0      F    v1   Matt  2021.0   11000.0        M
1  v1  Jenny  2022.0  90001.0      F    v1  Jenny  2022.0    9000.0        M
2  v1   Matt  2020.0  12000.0      F   NaN    NaN     NaN       NaN      NaN
3  v2  Jenny  2021.0  13000.0      M    v2   Matt  2022.0   11000.0        F
4  v2  Jenny  2020.0  14000.0      M    v2   Matt  2021.0   15000.0        F
  • Related