I have dataframe like below:
df = pd.DataFrame.from_dict(
{
'vid':['v1','v1','v1','v1','v1','v2','v2','v2','v2'],
'Name': ['Jenny', 'Matt', 'Jenny', 'Jenny', 'Matt', 'Jenny', 'Matt', 'Jenny', 'Matt'],
'Year': [2020, 2021,2022, 2022, 2020, 2021, 2022, 2020, 2021],
'Income': [10000, 11000,90001, 9000, 12000, 13000, 11000, 14000, 15000],
'Gender': ['F', 'M', 'F', 'M','F', 'M', 'F', 'M', 'F']
}
)
print(df)
output:
vid Name Year Income Gender
0 v1 Jenny 2020 10000 F
1 v1 Matt 2021 11000 M
2 v1 Jenny 2022 90001 F
3 v1 Jenny 2022 9000 M
4 v1 Matt 2020 12000 F
5 v2 Jenny 2021 13000 M
6 v2 Matt 2022 11000 F
7 v2 Jenny 2020 14000 M
8 v2 Matt 2021 15000 F
i want to append two rows into single row by using Gender as well as need to groupby with vid. i dont bother on increasing the columns
My expected output should be:
vid Name Year Income Gender vid_1 Name_1 Year_1 Income_1 Gender_1
v1 Jenny 2020 10000 F v1 Matt 2021 11000 M
v1 Jenny 2022 90001 F v1 Jenny 2022 9000 M
v1 Matt 2020 12000 F NA NA NA NA NA
v2 Jenny 2021 13000 M v2 Matt 2022 11000 F
v2 Jenny 2020 14000 M v2 Matt 2021 15000 F
Any suggestions how to do this would be helpful.
CodePudding user response:
Use GroupBy.cumcount
for counter per vid
, create Multiindex
and reshape by DataFrame.unstack
, sorting columns and flatten MultiIndex
:
a = df.groupby('vid').cumcount()
df = (df.set_index([df['vid'], a // 2, a % 2])
.unstack()
.sort_index(level=1, axis=1, sort_remaining=False)
.reset_index(drop=True))
df.columns = [a if b == 0 else f'{a}_{b}' for a, b in df.columns]
print (df)
vid Name Year Income Gender vid_1 Name_1 Year_1 Income_1 Gender_1
0 v1 Jenny 2020.0 10000.0 F v1 Matt 2021.0 11000.0 M
1 v1 Jenny 2022.0 90001.0 F v1 Jenny 2022.0 9000.0 M
2 v1 Matt 2020.0 12000.0 F NaN NaN NaN NaN NaN
3 v2 Jenny 2021.0 13000.0 M v2 Matt 2022.0 11000.0 F
4 v2 Jenny 2020.0 14000.0 M v2 Matt 2021.0 15000.0 F