I have dataframe like below:
import pandas as pd
df = pd.DataFrame.from_dict(
{
'Name': ['Jenny', 'Matt', 'Jenny', 'Matt', 'Jenny', 'Matt', 'Jenny', 'Matt'],
'Year': [2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021],
'Income': [10000, 11000, 9000, 12000, 13000, 11000, 14000, 15000],
'Gender': ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M']
}
)
print(df)
Output:
Name Year Income Gender
0 Jenny 2020 10000 F
1 Matt 2021 11000 M
2 Jenny 2022 9000 F
3 Matt 2020 12000 M
4 Jenny 2021 13000 F
5 Matt 2022 11000 M
6 Jenny 2020 14000 F
7 Matt 2021 15000 M
i want to append two rows into single row by using Gender (both F and M should be in one row). i dont bother on increasing the columns
My expected output should be:
Name Year Income Gender Name1 Year1 Income1 Gender1
Jenny 2020 10000 F Matt 2021 11000 M
Jenny 2022 9000 F Matt 2020 12000 M
Jenny 2021 13000 F Matt 2022 11000 M
Jenny 2020 14000 F Matt 2021 15000 M
Any suggestions how to do this would be helpful.
CodePudding user response:
Split the dataframe into two dataframes (one for Gender = 'F'
and one for Gender = 'M'
) and concatenate them:
df1 = df[df.Gender == 'F'].reset_index(drop=True)
df2 = df[df.Gender == 'M'].reset_index(drop=True)
df2.columns = [i '1' for i in df.columns]
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Name Year Income Gender Name1 Year1 Income1 Gender1
0 Jenny 2020 10000 F Matt 2021 11000 M
1 Jenny 2022 9000 F Matt 2020 12000 M
2 Jenny 2021 13000 F Matt 2022 11000 M
3 Jenny 2020 14000 F Matt 2021 15000 M
CodePudding user response:
Use DataFrame.pivot
with helper columns by DataFrame.assign
with counter by GroupBy.cumcount
and factorize
, then sorting MultiIndex
and flatten in list comprehension:
df1 = (df.assign(g=df.groupby('Gender').cumcount(),
new=pd.factorize(df['Gender'])[0])
.pivot('g','new').sort_index(level=1, axis=1, sort_remaining=False))
df1.columns = [a if b==0 else f'{a}{b}' for a, b in df1.columns]
print (df1)
Name Year Income Gender Name1 Year1 Income1 Gender1
g
0 Jenny 2020 10000 F Matt 2021 11000 M
1 Jenny 2022 9000 F Matt 2020 12000 M
2 Jenny 2021 13000 F Matt 2022 11000 M
3 Jenny 2020 14000 F Matt 2021 15000 M
If need use Name
s for splitting:
df1 = (df.assign(g=df.groupby('Name').cumcount(),
new=pd.factorize(df['Name'])[0])
.pivot('g','new').sort_index(level=1, axis=1, sort_remaining=False))
df1.columns = [a if b==0 else f'{a}{b}' for a, b in df1.columns]
CodePudding user response:
To achieve your desire output we can use pandas concat function. by splitting your main data frame into two by Gender. To concat as rows we will reset the index of both slicer data.
Secondly, I created the loop where its check if column name is duplicate or not, if it will be duplicated will add addtional string '_1'.
Code:
df = pd.concat([df[df['Gender']=='M'].reset_index(drop=True),df[df['Gender']=='F'].reset_index(drop=True)],axis=1)
df.columns = [name if duplicated == False else name '_1' for duplicated, name in zip(df.columns.duplicated(), df.columns)]
df
Output:
Name Year Income Gender Name_1 Year_1 Income_1 Gender_1
0 Matt 2021 11000 M Jenny 2020 10000 F
1 Matt 2020 12000 M Jenny 2022 9000 F
2 Matt 2022 11000 M Jenny 2021 13000 F
3 Matt 2021 15000 M Jenny 2020 14000 F