This should be a simple task, but I cannot figure it out!
I have a dataframe containing English Premier league football data. All I want to do is to add a season column & use the date to set the season e.g. Date = 2021-08-13, season = 2021.
But try as I might - cannot make it work. How do i do this? Last attempt:
def get_season(row):
if (row['Date'] > pd.to_datetime("01/08/2021")):
season = "2021"
prem_data_new['Season'] = prem_data_new.apply(lambda row: get_season(row),axis = 1)
prem_data_new
CodePudding user response:
- Check the prem_data_new['Date'] type.
- If it is datetime use something like this
prem_data_new['Season'] = prem_data_new['Date'].year
- If not try
prem_data_new['Date'] = pd.to_datetime(prem_data_new['Date'], errors='coerce')
and then get to point 2.
To get season, you can try this func:
def func(ser):
lst = []
for i in range(len(ser)):
if ser[i].month < 8:
lst.append(ser[i].year)
else:
lst.append(ser[i].year 1)
return lst
prem_data_new['Season'] = func(prem_data_new['Date'])
or
prem_data_new['Season'] = [x.year if x.month < 8 else x.year 1 for x in prem_data_new['Date']]
CodePudding user response:
you can also use the map function and define a function to return the year assignable to the dataframe field
CodePudding user response:
Since you are deciding the season based on the month and day (e.g. 1st of Aug), your function should compare the month and day of the date field.
Demo:
import datetime
import pandas as pd
prem_data_new = pd.DataFrame({
'div':['e0','e0','e0','e0','e0'],
'date':['2021-08-13','2021-08-14','2020-07-30','2020-08-13','2021-08-01'],
'hometown':['Brentford','Man United','Burnley','Chelesa','Everton',],
'awayteam':['Arsenal','Leeds','Brighton','Crystal Palace','Southampton'],
'fthg':[2,5,1,3,3],
'ftag':[0,1,2,0,1],
'hthg':[1,1,1,2,0],
'htag':[0,0,0,0,1]
})
def get_season(row):
date_ser = pd.to_datetime(row['date'])
return date_ser.year if (date_ser.month >= 8 and date_ser.day >= 1) else date_ser.year - 1
prem_data_new['season'] = prem_data_new.apply(lambda row: get_season(row),axis = 1)
print(prem_data_new)
Output:
div date hometown awayteam fthg ftag hthg htag season
0 e0 2021-08-13 Brentford Arsenal 2 0 1 0 2021
1 e0 2021-08-14 Man United Leeds 5 1 1 0 2021
2 e0 2020-07-30 Burnley Brighton 1 2 1 0 2019
3 e0 2020-08-13 Chelesa Crystal Palace 3 0 2 0 2020
4 e0 2021-08-01 Everton Southampton 3 1 0 1 2021