If I have a pandas dataframe it's possible to get values from a row and use it as a label for a new column?
I have something like this:
| Team| DateTime| Score
| Red| 2021/03/19 | 5
| Red| 2021/03/20 | 10
| Blue| 2022/04/10 | 20
I would like to write this data on a new dataframe that has:
Team Column
Year/Month SumScore Column
So I would have a row per team with multiple new columns for a month in a year that contains the sum of the score for a specific month. It should be like this:
Team | 2021/03 | 2022/04 |
---|---|---|
Red | 15 | 0 |
Blue | 0 | 20 |
The date format time is YYYY/MM/DD
I hope I was clear
CodePudding user response:
You can use
df = (df.assign(YM=df['DateTime'].str.rsplit('/', 1).str[0])
.pivot_table(index='Team', columns='YM', values='Score', aggfunc='sum', fill_value=0)
.reset_index())
print(df)
YM Team 2021/03 2022/04
0 Blue 0 20
1 Red 15 0
CodePudding user response:
We can use pd.crosstab
which allows us to
Compute a simple cross tabulation of two (or more) factors
Below I've changed df['DateTime']
to contain year/month
only.
df['DateTime'] = pd.to_datetime(df['DateTime']).dt.strftime('%Y/%m')
pd.crosstab(
df['Team'],
df['DateTime'],
values=df['Score'],
aggfunc='sum'
).fillna(0)
If you don't want multiple levels in the index, just use the method call reset_index
on your crosstab and then drop DateTime
.