Home > database >  Map Counter Object to DataFrame to create new column
Map Counter Object to DataFrame to create new column

Time:12-11

Problem


I have a data set with played football games and I'm trying to look more closely into the games played by each team at home. Therefore, I am trying to create a column for every team with a counter that tracks the consecutive wins. The following logic should apply:

  • If win: increase counter = 1
  • If tie: keep counter
  • If loss: decrease counter -= 1
df[['Date', 'Home', 'Home result']].head()

|      | Date     | Home           | Home result |
| ---- | -------- | -------------- | ----------- |
| 0    | 1/1/2018 | Stoke City     | Defeat      |
| 1    | 1/1/2018 | Brighton       | Tie         |
| 2    | 1/1/2018 | Burnley        | Defeat      |
| 3    | 1/1/2018 | Leicester City | Victory     |

I created a function that outputs a dictionary that can be mapped to the DataFrame (sorted by Date) to create the new counter column.


    def result_counter(df):
        d = {}
        for index, row in df.iterrows():
            counter = 0
            for second_index, second_row in df.iloc[:index 1].iterrows(): 
                if second_row['Home result'] == 'Victory':
                    counter  = 1
                elif second_row['Home result'] == 'Tie':
                    counter  = 0
                else:
                    counter -= 1
            if second_row['Home'] not in d:
                d[second_row['Home']] = [counter]
            else:
                d[second_row['Home']].append(counter)
                
        return d

Expected Result


A new column should be created that indicates the consecutive score of the home team. The expected result for a subset should look like this:

|      | Date       | Home      | Home result | Counter |
| ---- | ---------- | --------- | ----------- | ------- |
| 0    | 2017-08-19 | Liverpool | Victory     | 1       |
| 1    | 2017-08-27 | Liverpool | Victory     | 2       |
| 2    | 2017-09-16 | Liverpool | Tie         | 2       |
| 3    | 2017-10-14 | Liverpool | Defeat      | 1       |

What I tried


I tried to map the dict to the new DataFrame. When doing this I either get a map object displayed or the whole list that corresponds to the key in the dictionary.

# with subset and map 
sorted_liverpool['Counter'] = map(sorted_liverpool, d)
sorted_liverpool[['Date', 'Home', 'Home result', 'Counter']].head()

|      | Date       | Home      | Home result | Counter                        |
| ---- | ---------- | --------- | ----------- | ------------------------------ |
| 0    | 2017-08-19 | Liverpool | Victory     | <map object at 0x7fc0650fe550> |
| 1    | 2017-08-27 | Liverpool | Victory     | <map object at 0x7fc0650fe550> |
| 2    | 2017-09-16 | Liverpool | Tie         | <map object at 0x7fc0650fe550> |
| 3    | 2017-10-14 | Liverpool | Defeat      | <map object at 0x7fc0650fe550> |
# with subset and map directly to "home" column
sorted_liverpool['Counter'] = sorted_liverpool['Home'].map(d)
sorted_liverpool[['Date', 'Home', 'Home result', 'Counter']].head()

|      | Date       | Home      | Home result | Counter                                           |
| ---- | ---------- | --------- | ----------- | ------------------------------------------------- |
| 0    | 2017-08-19 | Liverpool | Victory     | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... |
| 1    | 2017-08-27 | Liverpool | Victory     | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... |
| 2    | 2017-09-16 | Liverpool | Tie         | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... |
| 3    | 2017-10-14 | Liverpool | Defeat      | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... |

How can I map the created dictionary to the DataFrame so it displays only the corresponding value in the list and not the whole list?

CodePudding user response:

Avoid loop, especially iterrows when you can.

You need to map the Home result to respective value, then cumsum:

df['Counter'] = df['Home result'].map({
   'Victory': 1,
   'Tie': 0,
   'Defeat': -1
}).cumsum()

Also, you might need to groupby on Home:

df['Counter'] = df['Home result'].map({
       'Victory': 1,
       'Tie': 0,
       'Defeat': -1
    }).groupby(df['Home']).cumsum()

Output:

         Date       Home Home result  Counter
0  2017-08-19  Liverpool     Victory        1
1  2017-08-27  Liverpool     Victory        2
2  2017-09-16  Liverpool         Tie        2
3  2017-10-14  Liverpool      Defeat        1

CodePudding user response:

print(df)

        Date        Home Home result
0 2017-08-19   Liverpool     Victory
1 2017-08-27   Liverpool     Victory
2 2017-09-16   Liverpool         Tie
3 2017-10-14   Liverpool      Defeat
4 2017-08-20  Stoke City      Defeat
5 2017-08-28  Stoke City         Tie
6 2017-09-17  Stoke City      Defeat
7 2017-10-15  Stoke City     Victory

    df['new column'] = df['Home result'].map({'Victory': 1, 'Tie': 0, 'Defeat': -1})
    df['new column'] = df.groupby('Home')['new column'].transform(pd.Series.cumsum)
    print(df)

        Date        Home Home result  new column
0 2017-08-19   Liverpool     Victory           1
1 2017-08-27   Liverpool     Victory           2
2 2017-09-16   Liverpool         Tie           2
3 2017-10-14   Liverpool      Defeat           1
4 2017-08-20  Stoke City      Defeat          -1
5 2017-08-28  Stoke City         Tie          -1
6 2017-09-17  Stoke City      Defeat          -2
7 2017-10-15  Stoke City     Victory          -1

CodePudding user response:

try group by home and accumulative sum cumsum the results of an apply logic

txt="""Date,Home,Home Result
1/1/2018,Stoke City,Defeat
1/1/2018,Brighton,Tie
1/1/2018,Burnley,Defeat
1/1/2018,Leicester City,Victory
2017-08-19,Liverpool,Victory
2017-08-27,Liverpool,Victory
2017-09-16,Liverpool,Tie
2017-10-14,Liverpool,Defeat"""

df = pd.read_table(StringIO(txt), sep=',')    
df['Date']=pd.to_datetime(df['Date'])
df['Outcome']=df['Home Result'].apply(lambda x: -1 if x=='Defeat' else 0 if x=='Tie' else 1 if x=='Victory' else 0)
df=df.sort_values(by=['Home','Date'],ascending=True)

df['Date_Total'] = df.groupby(['Home'])['Outcome'].cumsum()
print(df)

output

 Date            Home Home Result  Outcome  Date_Total
1 2018-01-01        Brighton         Tie        0           0
2 2018-01-01         Burnley      Defeat       -1          -1
3 2018-01-01  Leicester City     Victory        1           1
4 2017-08-19       Liverpool     Victory        1           1
5 2017-08-27       Liverpool     Victory        1           2
6 2017-09-16       Liverpool         Tie        0           2
7 2017-10-14       Liverpool      Defeat       -1           1
0 2018-01-01      Stoke City      Defeat       -1          -1
  • Related