Problem
I have a data set with played football games and I'm trying to look more closely into the games played by each team at home. Therefore, I am trying to create a column for every team with a counter that tracks the consecutive wins. The following logic should apply:
- If win: increase counter = 1
- If tie: keep counter
- If loss: decrease counter -= 1
df[['Date', 'Home', 'Home result']].head()
| | Date | Home | Home result |
| ---- | -------- | -------------- | ----------- |
| 0 | 1/1/2018 | Stoke City | Defeat |
| 1 | 1/1/2018 | Brighton | Tie |
| 2 | 1/1/2018 | Burnley | Defeat |
| 3 | 1/1/2018 | Leicester City | Victory |
I created a function that outputs a dictionary that can be mapped to the DataFrame (sorted by Date) to create the new counter column.
def result_counter(df):
d = {}
for index, row in df.iterrows():
counter = 0
for second_index, second_row in df.iloc[:index 1].iterrows():
if second_row['Home result'] == 'Victory':
counter = 1
elif second_row['Home result'] == 'Tie':
counter = 0
else:
counter -= 1
if second_row['Home'] not in d:
d[second_row['Home']] = [counter]
else:
d[second_row['Home']].append(counter)
return d
Expected Result
A new column should be created that indicates the consecutive score of the home team. The expected result for a subset should look like this:
| | Date | Home | Home result | Counter |
| ---- | ---------- | --------- | ----------- | ------- |
| 0 | 2017-08-19 | Liverpool | Victory | 1 |
| 1 | 2017-08-27 | Liverpool | Victory | 2 |
| 2 | 2017-09-16 | Liverpool | Tie | 2 |
| 3 | 2017-10-14 | Liverpool | Defeat | 1 |
What I tried
I tried to map the dict to the new DataFrame. When doing this I either get a map object displayed or the whole list that corresponds to the key in the dictionary.
# with subset and map
sorted_liverpool['Counter'] = map(sorted_liverpool, d)
sorted_liverpool[['Date', 'Home', 'Home result', 'Counter']].head()
| | Date | Home | Home result | Counter |
| ---- | ---------- | --------- | ----------- | ------------------------------ |
| 0 | 2017-08-19 | Liverpool | Victory | <map object at 0x7fc0650fe550> |
| 1 | 2017-08-27 | Liverpool | Victory | <map object at 0x7fc0650fe550> |
| 2 | 2017-09-16 | Liverpool | Tie | <map object at 0x7fc0650fe550> |
| 3 | 2017-10-14 | Liverpool | Defeat | <map object at 0x7fc0650fe550> |
# with subset and map directly to "home" column
sorted_liverpool['Counter'] = sorted_liverpool['Home'].map(d)
sorted_liverpool[['Date', 'Home', 'Home result', 'Counter']].head()
| | Date | Home | Home result | Counter |
| ---- | ---------- | --------- | ----------- | ------------------------------------------------- |
| 0 | 2017-08-19 | Liverpool | Victory | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... |
| 1 | 2017-08-27 | Liverpool | Victory | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... |
| 2 | 2017-09-16 | Liverpool | Tie | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... |
| 3 | 2017-10-14 | Liverpool | Defeat | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... |
How can I map the created dictionary to the DataFrame so it displays only the corresponding value in the list and not the whole list?
CodePudding user response:
Avoid loop, especially iterrows
when you can.
You need to map the Home result
to respective value, then cumsum:
df['Counter'] = df['Home result'].map({
'Victory': 1,
'Tie': 0,
'Defeat': -1
}).cumsum()
Also, you might need to groupby
on Home
:
df['Counter'] = df['Home result'].map({
'Victory': 1,
'Tie': 0,
'Defeat': -1
}).groupby(df['Home']).cumsum()
Output:
Date Home Home result Counter
0 2017-08-19 Liverpool Victory 1
1 2017-08-27 Liverpool Victory 2
2 2017-09-16 Liverpool Tie 2
3 2017-10-14 Liverpool Defeat 1
CodePudding user response:
print(df)
Date Home Home result
0 2017-08-19 Liverpool Victory
1 2017-08-27 Liverpool Victory
2 2017-09-16 Liverpool Tie
3 2017-10-14 Liverpool Defeat
4 2017-08-20 Stoke City Defeat
5 2017-08-28 Stoke City Tie
6 2017-09-17 Stoke City Defeat
7 2017-10-15 Stoke City Victory
df['new column'] = df['Home result'].map({'Victory': 1, 'Tie': 0, 'Defeat': -1})
df['new column'] = df.groupby('Home')['new column'].transform(pd.Series.cumsum)
print(df)
Date Home Home result new column
0 2017-08-19 Liverpool Victory 1
1 2017-08-27 Liverpool Victory 2
2 2017-09-16 Liverpool Tie 2
3 2017-10-14 Liverpool Defeat 1
4 2017-08-20 Stoke City Defeat -1
5 2017-08-28 Stoke City Tie -1
6 2017-09-17 Stoke City Defeat -2
7 2017-10-15 Stoke City Victory -1
CodePudding user response:
try group by home and accumulative sum cumsum the results of an apply logic
txt="""Date,Home,Home Result
1/1/2018,Stoke City,Defeat
1/1/2018,Brighton,Tie
1/1/2018,Burnley,Defeat
1/1/2018,Leicester City,Victory
2017-08-19,Liverpool,Victory
2017-08-27,Liverpool,Victory
2017-09-16,Liverpool,Tie
2017-10-14,Liverpool,Defeat"""
df = pd.read_table(StringIO(txt), sep=',')
df['Date']=pd.to_datetime(df['Date'])
df['Outcome']=df['Home Result'].apply(lambda x: -1 if x=='Defeat' else 0 if x=='Tie' else 1 if x=='Victory' else 0)
df=df.sort_values(by=['Home','Date'],ascending=True)
df['Date_Total'] = df.groupby(['Home'])['Outcome'].cumsum()
print(df)
output
Date Home Home Result Outcome Date_Total
1 2018-01-01 Brighton Tie 0 0
2 2018-01-01 Burnley Defeat -1 -1
3 2018-01-01 Leicester City Victory 1 1
4 2017-08-19 Liverpool Victory 1 1
5 2017-08-27 Liverpool Victory 1 2
6 2017-09-16 Liverpool Tie 0 2
7 2017-10-14 Liverpool Defeat -1 1
0 2018-01-01 Stoke City Defeat -1 -1