How to transpose two particular column and keep first row in python?-CodePudding

I have the data frame as follows:

 df = pd.DataFrame({
        'ID': [12, 12, 15, 15, 16, 17, 17],
        'Name': ['A', 'A', 'B', 'B', 'C', 'D', 'D'],
        'Date':['2019-12-20' ,'2018-12-20' ,'2017-12-20' , '2016-12-20', '2015-12-20', '2014-12-20', '2013-12-20'],
        'Color':['Black', 'Blue', 'Red' , 'Yellow' , 'White' , 'Sky' , 'Green']
    })

or data table:


   ID   Name    Date    Color
0   12  A   2019-12-20  Black
1   12  A   2018-12-20  Blue
2   15  B   2017-12-20  Red
3   15  B   2016-12-20  Yellow
4   16  C   2015-12-20  White
5   17  D   2014-12-20  Sky
6   17  D   2013-12-20  Green

My desired result would be as below table. How could I get that?


    ID  Name    Date    Color   Date_       Color_
0   12  A   2019-12-20  Black   2018-12-20  Blue
1   15  B   2017-12-20  Red     2016-12-20  Yellow
2   16  C   2015-12-20  White   2015-12-20  White
3   17  D   2014-12-20  Sky     2013-12-20  Green

I need your help, thanks in advance!

CodePudding user response：

Use virtual groups to set each row to a column. The rest is just formatting.

# Identify target column for each row
out = df.assign(col=df.groupby('Name').cumcount().astype(str)) \
        .pivot(index=['ID', 'Name'], columns='col', values=['Date', 'Color']) \
        .ffill(axis=1)

# Sort columns according your output
out = out.sort_index(level=[1, 0], axis=1, ascending=[True, False])

# Flat the multiindex column
out.columns = out.columns.to_flat_index().str.join('_')

# Reset index
out = out.reset_index()

Output:

>>> out
   ID Name      Date_0 Color_0      Date_1 Color_1
0  12    A  2019-12-20   Black  2018-12-20    Blue
1  15    B  2017-12-20     Red  2016-12-20  Yellow
2  16    C  2015-12-20   White  2015-12-20   White
3  17    D  2014-12-20     Sky  2013-12-20   Green

After pivot, your dataframe looks like:

>>> df.assign(col=df.groupby('Name').cumcount().astype(str)) \
      .pivot(index=['ID', 'Name'], columns='col', values=['Date', 'Color']) \
      .ffill(axis=1)

               Date              Color        
col               0           1      0       1
ID Name                                       
12 A     2019-12-20  2018-12-20  Black    Blue
15 B     2017-12-20  2016-12-20    Red  Yellow
16 C     2015-12-20  2015-12-20  White   White
17 D     2014-12-20  2013-12-20    Sky   Green

CodePudding user response：

Try with this approach:

result = (
    df.merge(df, on=['ID', 'Name'])
      .drop_duplicates(['ID', 'Name', 'Date_x', 'Color_x'], keep='last')
      .drop_duplicates(['ID', 'Name', 'Date_y', 'Color_y'])
      .rename(columns={'Date_x': 'Date',
                       'Color_x': 'Color',
                       'Date_y': 'Date_',
                       'Color_y': 'Color_'})
)

This is the result:

    ID Name        Date  Color       Date_  Color_
1   12    A  2019-12-20  Black  2018-12-20    Blue
5   15    B  2017-12-20    Red  2016-12-20  Yellow
8   16    C  2015-12-20  White  2015-12-20   White
10  17    D  2014-12-20    Sky  2013-12-20   Green

CodePudding user response：

Algorithmic rather than a pythonic way:

df['count'] = df['ID'].map(df['ID'].value_counts())
df = pd.concat([df, df[df['count'] == 1]]).drop(columns=['count'])
df['s'] = range(len(df))
df = df.merge(df, left_on='ID', right_on='ID')
df = df[df['s_x'] < df['s_y']].drop(columns=['s_x', 's_y', 'Name_y']).rename(
    columns={
        'Name_x': 'Name',
        'Date_x': 'Date',
        'Color_x': 'Color',
        'Date_y': 'Date_',
        'Color_y': 'Color_'})
print (df)

Output:

    ID Name        Date  Color       Date_  Color_
1   12    A  2019-12-20  Black  2018-12-20    Blue
5   15    B  2017-12-20    Red  2016-12-20  Yellow
9   16    C  2015-12-20  White  2015-12-20   White
13  17    D  2014-12-20    Sky  2013-12-20   Green