Home > Blockchain >  How to convert all column values to dates?
How to convert all column values to dates?

Time:01-15

I'm trying to convert all data in a column from the below to dates.

Event Date
2020-07-16 00:00:00
31/03/2022, 26/11/2018, 31/01/2028

This is just a small section of the data - there are more columns/rows.

I've tried to split out the cells with multiple values using the below:

df["Event Date"] = df["Event Date"].str.replace(' ', '')
df["Event Date"] = df["Event Date"].str.split(",")
df= df.explode("Event Date")

The issue with this is it sets any cell without a ',' e.g. '2020-07-16 00:00:00' to NaN.

Is there any way to separate the values with a ',' and set the entire column to date types?

CodePudding user response:

Here is a proposition with pandas.Series.str.split and pandas.Series.explode :

s_dates = (
            df["Event Date"]
                .str.split(",")
                .explode(ignore_index=True)
                .apply(pd.to_datetime, dayfirst=True)
           )

Output :

0   2020-07-16
1   2022-03-31
2   2018-11-26
3   2028-01-31
Name: Event Date, dtype: datetime64[ns]

CodePudding user response:

Your example table shows mixed date formats in each row. The idea is to try a date parsing technique and then try another if it fails. Using loops and having such wide variations of data types are red flags with a script design. I recommend using datetime and dateutil to handle the dates.

from datetime import datetime
from dateutil import parser

date_strings = ["2020-07-16 00:00:00", "31/03/2022, 26/11/2018, 31/01/2028"] % Get these from your table.
parsed_dates = []

for date_string in date_strings:
    try:
        # strptime
        date_object = datetime.strptime(date_string, "%Y-%m-%d %H:%M:%S")
        parsed_dates.append(date_object)
    except ValueError:
        # parser.parse() and split
        date_strings = date_string.split(",")
        for date_str in date_strings:
            date_str = date_str.strip()
            date_object = parser.parse(date_str, dayfirst=True)
            parsed_dates.append(date_object)

print(parsed_dates)

Try the code on Trinket: https://trinket.io/python3/95c0d14271

CodePudding user response:

You can use combination of split and explode to separate dates and then use infer_datetime_format to convert mixed date types

df = df.assign(dates=df['dates'].str.split(',')).explode('dates')
df
Out[18]: 
                 dates
0  2020-07-16 00:00:00
1           31/03/2022
1           26/11/2018
1           31/01/2028 

df.dates = pd.to_datetime(df.dates,  infer_datetime_format=True)

df.dates
Out[20]: 
0   2020-07-16
1   2022-03-31
1   2018-11-26
1   2028-01-31
Name: dates, dtype: datetime64[ns]
  • Related