I have a question. In my dataframe, I have names (of players of a game) stored in the four columns W1, W2, W3, W4 for the four players that won a round and in L1, L2, L3, L4 that lost a round. There is also a column WP for winning points (depending on map played) and LP for losing points which is simply -WP. These would be the .head() of the 10 columns in question. Index are the rounds played. Its basically a history/tracker of past games.
{'W1': {0: nan, 1: 'Matthi', 2: 'Vain', 3: 'Matthi', 4: 'Markus'},
'W2': {0: nan, 1: 'Paddy', 2: 'Marvin', 3: 'Jonas', 4: 'Marvin'},
'W3': {0: nan, 1: 'Max', 2: 'Paddy', 3: 'Vain', 4: 'Johann'},
'W4': {0: nan, 1: nan, 2: 'Max', 3: 'Johannes', 4: 'Max'},
'WP': {0: nan, 1: 5.0, 2: 5.0, 3: 3.0, 4: 4.0},
'L1': {0: nan, 1: 'Timi', 2: 'Matthi', 3: 'Timi', 4: 'Matthi'},
'L2': {0: nan, 1: 'Markus', 2: 'Timi', 3: 'Markus', 4: 'Timi'},
'L3': {0: nan, 1: 'Marvin', 2: 'Markus', 3: 'Marvin', 4: 'Vain'},
'L4': {0: nan, 1: nan, 2: 'Johannes', 3: 'Nille', 4: 'Paddy'},
'LP': {0: nan, 1: -5.0, 2: -5.0, 3: -3.0, 4: -4.0}}
What I want to do, is make a ranking, so for example for "Max" add WP for every time he was in either of the winning player columns and subtract LP for every time he was in either of the losing player columns.
I hoped I could just do this:
df.groupby("W1")["WP"].sum() df.groupby("W2")["WP"].sum() df.groupby("W3")["WP"].sum() df.groupby("W4")["WP"].sum() df.groupby("L1")["LP"].sum() df.groupby("L2")["LP"].sum() df.groupby("L3")["LP"].sum() df.groupby("L4")["LP"].sum()
But that gives me (lots of NaN that "shouldnt" be there, this is over the whole df of course):
Funkey NaN
Johann NaN
Johannes -16.0
Jonas 22.0
Markus NaN
Marvin 41.0
Matthi NaN
Max 38.0
Nille NaN
Paddy -2.0
Timi NaN
Vain -16.0
dtype: float64
The weird thing (to me) is that all the values that are not NaN are correct.
Now, the problem, I think, is that the names are somewhat random across the four columns. It also means not each of the columns will have all the names. E.g. "Max" can sometimes appear in W1 and sometimes in W2, but maybe not in W3 or W4 ever.
I also thought I could do it with:
pd.Series.add(df.groupby("W1")["WP"].sum(), df.groupby("W2")["WP"].sum(), etc., fill_value=0)
But that gives me an error if I put more than three of the groupbys.
How can I achieve what I want to do?
CodePudding user response:
You can use melt
to flatten your dataframe and compute points for each player.
df1 = df.melt(['WP', 'LP'], var_name='W/L', value_name='Player').dropna()
df1['Points'] = np.where(df1['W/L'].str[0] == 'W', df1['WP'], df1['LP'])
out = df1.groupby('Player', as_index=False)['Points'].sum() \
.sort_values('Points', ascending=False, ignore_index=True)
Output
Player | Points |
---|---|
Max | 14 |
Paddy | 6 |
Johann | 4 |
Vain | 4 |
Jonas | 3 |
Marvin | 1 |
Matthi | -1 |
Johannes | -2 |
Nille | -3 |
Markus | -9 |
Timi | -17 |
Check df1
to see the intermediate dataframe.
CodePudding user response:
IIUC, melt
to reshape, compute the points based on the column name (if starts with W
), and grouby
sum
:
import numpy as np
(df
.melt(id_vars=['WP','LP'], value_name='name')
.assign(points=lambda d: np.where(d['variable'].str.startswith('W'), d['WP'], d['LP']))
.groupby('name')['points'].sum()
)
Output:
name
Johann 4.0
Johannes -2.0
Jonas 3.0
Markus -9.0
Marvin 1.0
Matthi -1.0
Max 14.0
Nille -3.0
Paddy 6.0
Timi -17.0
Vain 4.0
Name: points, dtype: float64