Home > Net >  Pandas: Pivot dataframe with text and combine columns
Pandas: Pivot dataframe with text and combine columns

Time:04-16

I'm working with Python and Pandas and have a table like this:

      Name        Team    Fixture   Line-up     Min IN   Min Out
0     Player 1    RAY     J1        Starting             68
1     Player 2    RAY     J1        Bench       74       
2     Player 3    RSO     J2        Starting             45
3     Player 4    RSO     J2        Bench       45

I need to pivot the table making the rows of 'Fixture' as new columns containing the text of 'Line-up' the number of Min IN and OUT. Then the result should be like this:

      Name        Team    J1                J2
0     Player 1    RAY     Starting - 68
1     Player 2    RAY     Bench - 74      
2     Player 3    RSO                       Starting - 45
3     Player 4    RSO                       Bench - 45

Is there any way to make it? Thanks in advance!

CodePudding user response:

You could modify Line-up column by including the Min value, then pivot:

out = (df.assign(**{'Line-up': df['Line-up']   ' - '   
                    df.filter(like='Min').bfill(axis=1).iloc[:,0].astype(int).astype(str)})
       .pivot(['Name','Team'], 'Fixture','Line-up').rename_axis(columns=None).reset_index())

Output:

       Name Team             J1             J2
0  Player 1  RAY  Starting - 68            NaN
1  Player 2  RAY     Bench - 74            NaN
2  Player 3  RSO            NaN  Starting - 45
3  Player 4  RSO            NaN     Bench - 45

N.B. This assumes that the empty spaces in the Min columns are NaN values. If they are empty space '' actually, then you could convert them to NaN values first. So like:

out = (df.assign(**{'Line-up': df['Line-up']   ' - '   
                    df.filter(like='Min').replace('', pd.NA).bfill(axis=1).iloc[:,0].astype(int).astype(str)})
#                               here -->  ^^^^^^^^^^^^
       .pivot(['Name','Team'], 'Fixture','Line-up').rename_axis(columns=None).reset_index())

CodePudding user response:

Another version:

df = (
    df.set_index(["Name", "Team", "Fixture"])
    .apply(lambda x: " - ".join(x[x != ""]), axis=1)
    .unstack(level=2)
    .reset_index()
)
df.columns.name = ""

Prints:

       Name Team             J1             J2
0  Player 1  RAY  Starting - 68            NaN
1  Player 2  RAY     Bench - 74            NaN
2  Player 3  RSO            NaN  Starting - 45
3  Player 4  RSO            NaN     Bench - 45
  • Related