Home > Net >  How to add multiple series in pandas correctly?
How to add multiple series in pandas correctly?

Time:03-31

I have a question. In my dataframe, I have names (of players of a game) stored in the four columns W1, W2, W3, W4 for the four players that won a round and in L1, L2, L3, L4 that lost a round. There is also a column WP for winning points (depending on map played) and LP for losing points which is simply -WP. These would be the .head() of the 10 columns in question. Index are the rounds played. Its basically a history/tracker of past games.

{'W1': {0: nan, 1: 'Matthi', 2: 'Vain', 3: 'Matthi', 4: 'Markus'},
 'W2': {0: nan, 1: 'Paddy', 2: 'Marvin', 3: 'Jonas', 4: 'Marvin'},
 'W3': {0: nan, 1: 'Max', 2: 'Paddy', 3: 'Vain', 4: 'Johann'},
 'W4': {0: nan, 1: nan, 2: 'Max', 3: 'Johannes', 4: 'Max'},
 'WP': {0: nan, 1: 5.0, 2: 5.0, 3: 3.0, 4: 4.0},
 'L1': {0: nan, 1: 'Timi', 2: 'Matthi', 3: 'Timi', 4: 'Matthi'},
 'L2': {0: nan, 1: 'Markus', 2: 'Timi', 3: 'Markus', 4: 'Timi'},
 'L3': {0: nan, 1: 'Marvin', 2: 'Markus', 3: 'Marvin', 4: 'Vain'},
 'L4': {0: nan, 1: nan, 2: 'Johannes', 3: 'Nille', 4: 'Paddy'},
 'LP': {0: nan, 1: -5.0, 2: -5.0, 3: -3.0, 4: -4.0}}

What I want to do, is make a ranking, so for example for "Max" add WP for every time he was in either of the winning player columns and subtract LP for every time he was in either of the losing player columns.

I hoped I could just do this:

df.groupby("W1")["WP"].sum()  df.groupby("W2")["WP"].sum()  df.groupby("W3")["WP"].sum()  df.groupby("W4")["WP"].sum()  df.groupby("L1")["LP"].sum()  df.groupby("L2")["LP"].sum()  df.groupby("L3")["LP"].sum()  df.groupby("L4")["LP"].sum()

But that gives me (lots of NaN that "shouldnt" be there, this is over the whole df of course):

Funkey       NaN
Johann       NaN
Johannes   -16.0
Jonas       22.0
Markus       NaN
Marvin      41.0
Matthi       NaN
Max         38.0
Nille        NaN
Paddy       -2.0
Timi         NaN
Vain       -16.0
dtype: float64

The weird thing (to me) is that all the values that are not NaN are correct.

Now, the problem, I think, is that the names are somewhat random across the four columns. It also means not each of the columns will have all the names. E.g. "Max" can sometimes appear in W1 and sometimes in W2, but maybe not in W3 or W4 ever.

I also thought I could do it with:

pd.Series.add(df.groupby("W1")["WP"].sum(), df.groupby("W2")["WP"].sum(), etc., fill_value=0)

But that gives me an error if I put more than three of the groupbys.

How can I achieve what I want to do?

CodePudding user response:

You can use melt to flatten your dataframe and compute points for each player.

df1 = df.melt(['WP', 'LP'], var_name='W/L', value_name='Player').dropna()
df1['Points'] = np.where(df1['W/L'].str[0] == 'W', df1['WP'], df1['LP'])
out = df1.groupby('Player', as_index=False)['Points'].sum() \
         .sort_values('Points', ascending=False, ignore_index=True)

Output

Player Points
Max 14
Paddy 6
Johann 4
Vain 4
Jonas 3
Marvin 1
Matthi -1
Johannes -2
Nille -3
Markus -9
Timi -17

Check df1 to see the intermediate dataframe.

CodePudding user response:

IIUC, melt to reshape, compute the points based on the column name (if starts with W), and grouby sum:

import numpy as np
(df
 .melt(id_vars=['WP','LP'], value_name='name')
 .assign(points=lambda d: np.where(d['variable'].str.startswith('W'), d['WP'], d['LP']))
 .groupby('name')['points'].sum()
 )

Output:

name
Johann       4.0
Johannes    -2.0
Jonas        3.0
Markus      -9.0
Marvin       1.0
Matthi      -1.0
Max         14.0
Nille       -3.0
Paddy        6.0
Timi       -17.0
Vain         4.0
Name: points, dtype: float64
  • Related