How to append two rows from dataframe in a single row?-CodePudding

I have dataframe like below:

import pandas as pd

df = pd.DataFrame.from_dict(
    {
        'Name': ['Jenny', 'Matt', 'Jenny',  'Matt', 'Jenny',  'Matt', 'Jenny', 'Matt'],
        'Year': [2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021],
        'Income': [10000, 11000, 9000, 12000, 13000, 11000, 14000, 15000],
        'Gender': ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M']
    }
)
print(df)

Output:

     Name  Year  Income Gender
0  Jenny  2020   10000      F
1   Matt  2021   11000      M
2  Jenny  2022    9000      F
3   Matt  2020   12000      M
4  Jenny  2021   13000      F
5   Matt  2022   11000      M
6  Jenny  2020   14000      F
7   Matt  2021   15000      M

i want to append two rows into single row by using Gender (both F and M should be in one row). i dont bother on increasing the columns

My expected output should be:

 Name  Year  Income Gender  Name1  Year1  Income1 Gender1
Jenny  2020   10000      F Matt  2021   11000      M
Jenny  2022    9000      F Matt  2020   12000      M
Jenny  2021   13000      F Matt  2022   11000      M
Jenny  2020   14000      F Matt  2021   15000      M

Any suggestions how to do this would be helpful.

CodePudding user response：

Split the dataframe into two dataframes (one for Gender = 'F' and one for Gender = 'M') and concatenate them:

df1 = df[df.Gender == 'F'].reset_index(drop=True)
df2 = df[df.Gender == 'M'].reset_index(drop=True)
df2.columns = [i   '1' for i in df.columns]

result = pd.concat([df1, df2], axis=1)

print(result)

Output:

    Name  Year  Income Gender Name1  Year1  Income1 Gender1
0  Jenny  2020   10000      F  Matt   2021    11000       M
1  Jenny  2022    9000      F  Matt   2020    12000       M
2  Jenny  2021   13000      F  Matt   2022    11000       M
3  Jenny  2020   14000      F  Matt   2021    15000       M

CodePudding user response：

Use DataFrame.pivot with helper columns by DataFrame.assign with counter by GroupBy.cumcount and factorize, then sorting MultiIndex and flatten in list comprehension:

df1 = (df.assign(g=df.groupby('Gender').cumcount(), 
               new=pd.factorize(df['Gender'])[0])
        .pivot('g','new').sort_index(level=1, axis=1, sort_remaining=False))
df1.columns = [a if b==0 else f'{a}{b}' for a, b in df1.columns]
print (df1)
    Name  Year  Income Gender Name1  Year1  Income1 Gender1
g                                                          
0  Jenny  2020   10000      F  Matt   2021    11000       M
1  Jenny  2022    9000      F  Matt   2020    12000       M
2  Jenny  2021   13000      F  Matt   2022    11000       M
3  Jenny  2020   14000      F  Matt   2021    15000       M

If need use Names for splitting:

df1 = (df.assign(g=df.groupby('Name').cumcount(), 
               new=pd.factorize(df['Name'])[0])
        .pivot('g','new').sort_index(level=1, axis=1, sort_remaining=False))
df1.columns = [a if b==0 else f'{a}{b}' for a, b in df1.columns]

CodePudding user response：

To achieve your desire output we can use pandas concat function. by splitting your main data frame into two by Gender. To concat as rows we will reset the index of both slicer data.

Secondly, I created the loop where its check if column name is duplicate or not, if it will be duplicated will add addtional string '_1'.

Code:

df = pd.concat([df[df['Gender']=='M'].reset_index(drop=True),df[df['Gender']=='F'].reset_index(drop=True)],axis=1)
df.columns = [name if duplicated == False else name   '_1' for duplicated, name in zip(df.columns.duplicated(), df.columns)]
df

Output:

    Name    Year    Income  Gender  Name_1  Year_1  Income_1    Gender_1
0   Matt    2021    11000   M      Jenny    2020    10000       F
1   Matt    2020    12000   M      Jenny    2022    9000        F
2   Matt    2022    11000   M      Jenny    2021    13000       F
3   Matt    2021    15000   M      Jenny    2020    14000       F