I am trying to create a new column in a pandas dataframe from the values of another column. What I intend to obtain is from the column of goals per team, to distribute the goal count in an array of 10 positions. Where the 0
position of the array shows how many have scored between 0 and 10 goals and so on until the last position (9) that counts how many between 90 and 100.
For example:
From a dataframe like this one:
| player_id | team | goals |
|-----------|---------|--------|
| ply_1 | Arsenal | 100 |
| ply_2 | Arsenal | 2 |
| ply_3 | Chelsea | 21 |
| ply_4 | Chelsea | 13 |
| ply_5 | Arsenal | 50 |
Get one like the following:
| player_id | team | goals | goals_distribution_by_team|
|-----------|---------|-------|---------------------------|
| ply_1 | Arsenal | 100 | [1,0,0,0,0,1,0,0,0,1] |
| ply_2 | Arsenal | 2 | [1,0,0,0,0,1,0,0,0,1] |
| ply_3 | Chelsea | 21 | [0,1,1,0,0,0,0,0,0,0] |
| ply_4 | Chelsea | 13 | [0,1,1,0,0,0,0,0,0,0] |
| ply_5 | Arsenal | 50 | [1,0,0,0,0,1,0,0,0,1] |
In this case we can see that for Arsenal
team the number of goals are distributed for 0-10
(ply_2), 50-60
(ply_5) and 90-100
(ply_1).
So far I have achieved this by doing a for
that goes through the goals column and checks if the number of goals is in which range.
for goal in goals:
if 0 < goal <=10:
count()
if 10 < goal <=20:
count()
....
if 90 < goal <=100:
count()
Is there more pythonic way to achive this?
Thanks!
CodePudding user response:
Use get_dummies
with divide column by integers division by 10
, then add 0
by missing range values in DataFrame.reindex
, redistibute 1
per groups by GroupBy.transform
and last convert to lists:
df1 = pd.get_dummies((df['goals'] // 10))
max1 = range(df1.columns.max() 1)
df['goals_distribution_by_team'] = (df1.reindex(max1, axis=1, fill_value=0)
.groupby(df['team']).transform('max')
.to_numpy().tolist())
print (df)
player_id team goals goals_distribution_by_team
0 ply_1 Arsenal 100 [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
1 ply_2 Arsenal 2 [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
2 ply_3 Chelsea 21 [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
3 ply_4 Chelsea 13 [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
4 ply_5 Arsenal 50 [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
CodePudding user response:
One way is to group the dataframe by team
, then apply a function to get those ones and zeroes as lists:
out = (df.groupby('team')['goals']
.apply(lambda x: [int((i==x//10).any()) for i in range(11)])
)
team
Arsenal [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
Chelsea [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
Name: goals, dtype: object
Once you have it, you can merge it back to the original dataframe
df.reset_index().merge(out.rename('goals_distribution_by_team'), on='team')
player_id team goals goals_distribution_by_team
0 ply_1 Arsenal 100 [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
1 ply_2 Arsenal 2 [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
2 ply_5 Arsenal 50 [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
3 ply_3 Chelsea 21 [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
4 ply_4 Chelsea 13 [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]