I’m new in python and I would like to add a column based on a date with several condition. My data came from a sheet.
Currently my code looks like this :
#Save data in a DataFrame
df1 = pd.read_excel(stream_io1, sheet_name = "Sheet1", header=0)
# Use Accounting Date
df1['Item - Accounting Date'] = pd.to_datetime(df1['Item - Accounting Date'], format='%Y-%m-%d')
def condition(row):
if (row['Item - Accounting Date'] < '2020-01-01') in row['Item - Accounting Date']:
return "<2020"
if "2020" in row['Item - Accounting Date']:
return "2020"
if (row[(row['Item - Accounting Date'] >= "01/01/2021") & (row['Item - Accounting Date'] <="30/06/2021")]) in row['Item - Accounting Date']:
return "S1 2021"
if (row[(row['Item - Accounting Date'] > "30/06/2021") & (row['Item - Accounting Date'] <="31/12/2021")]) in row['Item - Accounting Date']:
return "S2 2021"
df1['Année'] = df1.apply(condition, axis = 1)
And I have this error message :
TypeError: '<' not supported between instances of 'Timestamp' and 'str'
I understand the error but I don’t know how to resolve it
CodePudding user response:
from datetime import datetime
datetime.strptime("2020-01-01","%Y-%m-%d")
this is how you can convert string to datetime object
CodePudding user response:
It appears you only need to apply condition
function to one column, so you can fix it as below using pd.to_datetime
:
def condition(row):
if row < pd.Timestamp('2020-01-01'):
return "<2020"
if "2020" in row:
return "2020"
if (row >= pd.Timestamp("2021-01-01")) & (row <=pd.Timestamp("2021-06-30")):
return "S1 2021"
if (row > pd.Timestamp("2021-06-30")) & (row <=pd.Timestamp("2021-12-31")):
return "S2 2021"
df1['Année'] = df1['Item - Accounting Date'].apply(condition)