Home > Enterprise >  Football DataFrame - Setting Season From Date
Football DataFrame - Setting Season From Date

Time:10-29

This should be a simple task, but I cannot figure it out!

I have a dataframe containing English Premier league football data. All I want to do is to add a season column & use the date to set the season e.g. Date = 2021-08-13, season = 2021.

Prem league

But try as I might - cannot make it work. How do i do this? Last attempt:

def get_season(row):
if (row['Date'] > pd.to_datetime("01/08/2021")):
    season = "2021"
prem_data_new['Season'] = prem_data_new.apply(lambda row: get_season(row),axis = 1)

prem_data_new

CodePudding user response:

  1. Check the prem_data_new['Date'] type.
  2. If it is datetime use something like this
    prem_data_new['Season'] = prem_data_new['Date'].year
  1. If not try
    prem_data_new['Date'] = pd.to_datetime(prem_data_new['Date'], errors='coerce')

and then get to point 2.


To get season, you can try this func:

def func(ser):
    lst = []
    for i in range(len(ser)):
      if ser[i].month < 8:
      lst.append(ser[i].year)
    else:
      lst.append(ser[i].year   1)
    return lst

prem_data_new['Season'] = func(prem_data_new['Date'])

or

prem_data_new['Season'] = [x.year if x.month < 8 else x.year   1 for x in prem_data_new['Date']]

CodePudding user response:

you can also use the map function and define a function to return the year assignable to the dataframe field

CodePudding user response:

Since you are deciding the season based on the month and day (e.g. 1st of Aug), your function should compare the month and day of the date field.

Demo:

import datetime
import pandas as pd

prem_data_new = pd.DataFrame({
    'div':['e0','e0','e0','e0','e0'],
    'date':['2021-08-13','2021-08-14','2020-07-30','2020-08-13','2021-08-01'],
    'hometown':['Brentford','Man United','Burnley','Chelesa','Everton',],
    'awayteam':['Arsenal','Leeds','Brighton','Crystal Palace','Southampton'],
    'fthg':[2,5,1,3,3],
    'ftag':[0,1,2,0,1],
    'hthg':[1,1,1,2,0],
    'htag':[0,0,0,0,1]
})

def get_season(row):
  date_ser = pd.to_datetime(row['date'])
  return date_ser.year if (date_ser.month >= 8 and date_ser.day >= 1) else date_ser.year - 1
prem_data_new['season'] = prem_data_new.apply(lambda row: get_season(row),axis = 1)

print(prem_data_new)

Output:

  div        date    hometown        awayteam  fthg  ftag  hthg  htag  season
0  e0  2021-08-13   Brentford         Arsenal     2     0     1     0    2021
1  e0  2021-08-14  Man United           Leeds     5     1     1     0    2021
2  e0  2020-07-30     Burnley        Brighton     1     2     1     0    2019
3  e0  2020-08-13     Chelesa  Crystal Palace     3     0     2     0    2020
4  e0  2021-08-01     Everton     Southampton     3     1     0     1    2021
  • Related