Home > Blockchain >  how to handle different date formats in a data frame
how to handle different date formats in a data frame

Time:06-08

I have spent 2 days trying to look for an answer for this but no luck so here i am. Also, i am super new to python.

I have a script that reads in multiple files. Each file has a different date format that i am able to handle using

temp_df['Invoice Date'] = pd.to_datetime(temp_df['Invoice Date'],format='%d/%m/%Y')

I have a few issues that i cant seem to solve:

1.One of my file has 2022-03-17 & 04/03/2022 with (YYYY-MM-DD) & (DD-MM-YYYY) respectively. So what im trying to do is apply different to_datetime() statement for different format and i could not figure out a way for the life of me. I tried to not specify a format but then it gets confused and messes up the format for rest of the dates too. Please note that Data is only for March.

Data is only for March

So what i thought to do was for example, if

pd.to_datetime(temp_df['Invoice Date'],format='%d/%m/%Y')`

fails or gives an error, try

pd.to_datetime(temp_df['Invoice Date'],format='%Y/%m/%d')
  1. One of my file is missing a date for a transaction, i want to apply the first of current month for that record. I have tried the below but it applies the date to all records.

         if temp_df['Distributor Invoice Date'].isnull():
         temp_df['Distributor Invoice Date'] = datetime.date.today().replace(day=1)
    
  2. I want a new column called Month that uses the date from temp_df['Invoice Date'].

Please let me know if anything is not clear and i will respond asap.

Thanks, Waleed

CodePudding user response:

Try:

# 1. Date format
temp_df['Invoice Date'] = pd.to_datetime(temp_df['Invoice Date'], dayfirst=True)

# 2. Fill missing transactions
d = datetime.date.today().replace(day=1)
temp_df['Distributor Invoice Date'] = temp_df['Distributor Invoice Date'].fillna(d)

# 3. New column
df['Month'] = temp_df['Invoice Date'].dt.month
  • Related