Home > Back-end >  Create a new column in a dataframe based on a date
Create a new column in a dataframe based on a date

Time:12-23

I’m new in python and I would like to add a column based on a date with several condition. My data came from a sheet.

Currently my code looks like this :

#Save data in a DataFrame

df1 = pd.read_excel(stream_io1, sheet_name = "Sheet1", header=0)

# Use Accounting Date 
df1['Item - Accounting Date'] = pd.to_datetime(df1['Item - Accounting Date'], format='%Y-%m-%d')

def condition(row):
    if (row['Item - Accounting Date'] < '2020-01-01') in row['Item - Accounting Date']:
        return "<2020"
    if "2020" in row['Item - Accounting Date']:
        return "2020"
    if (row[(row['Item - Accounting Date'] >= "01/01/2021") & (row['Item - Accounting Date'] <="30/06/2021")]) in row['Item - Accounting Date']:
        return "S1 2021"    
    if (row[(row['Item - Accounting Date'] > "30/06/2021") & (row['Item - Accounting Date'] <="31/12/2021")]) in row['Item - Accounting Date']:
        return "S2 2021" 

df1['Année'] = df1.apply(condition, axis = 1)

And I have this error message :

TypeError: '<' not supported between instances of 'Timestamp' and 'str'

I understand the error but I don’t know how to resolve it

CodePudding user response:

from datetime import datetime
datetime.strptime("2020-01-01","%Y-%m-%d")

this is how you can convert string to datetime object

CodePudding user response:

It appears you only need to apply condition function to one column, so you can fix it as below using pd.to_datetime:

def condition(row):
    if row < pd.Timestamp('2020-01-01'):
        return "<2020"
    if "2020" in row:
        return "2020"
    if (row >= pd.Timestamp("2021-01-01")) & (row <=pd.Timestamp("2021-06-30")):
        return "S1 2021"    
    if (row > pd.Timestamp("2021-06-30")) & (row <=pd.Timestamp("2021-12-31")):
        return "S2 2021" 

df1['Année'] = df1['Item - Accounting Date'].apply(condition)
  • Related