Home > Software design >  How do I transpose and aggregate this dataframe in right order?
How do I transpose and aggregate this dataframe in right order?

Time:06-28

I am trying to find an efficient way to create a dataframe which lists all distinct game values as the columns and then aggregates the rows by user_id for game play hours accordingly? This is my example df:

user_id | game      | game_hours | rank_order
1       | Fortnight | 1.5        | 1
1       | COD       | 0.5        | 2
1       | Horizon   | 1.7        | 3
1       | ...       | ...        | n
2       | Fifa2021  | 1.9        | 1
2       | A Way Out | 0.2        | 2
2       | ...       | ...        | n
...

Step 1: How do I get this to this df format (match rank order correctly due to time sequence)?

user_id | game_1     | game_2    | game_3 | game_n ...| game_hours
1       | Fortnight  | COD       | Horizon|           | 3.7
2       | Fifa21     | A Way Out |        |           | 2.1
...

CodePudding user response:

Use DataFrame.pivot with DataFrame.add_prefix and for new column DataFrame.assign with aggregation sum:

df = (df.pivot('user_id','rank_order','game')
        .add_prefix('game_')
        .assign(game_hours=df.groupby('user_id')['game_hours'].sum())
        .reset_index()
        .rename_axis(None, axis=1))

print (df)
   user_id     game_1     game_2   game_3  game_hours
0        1  Fortnight        COD  Horizon         3.7
1        2   Fifa2021  A Way Out      NaN         2.1
  • Related