I'm trying to convert all data in a column from the below to dates.
Event Date |
---|
2020-07-16 00:00:00 |
31/03/2022, 26/11/2018, 31/01/2028 |
This is just a small section of the data - there are more columns/rows.
I've tried to split out the cells with multiple values using the below:
df["Event Date"] = df["Event Date"].str.replace(' ', '')
df["Event Date"] = df["Event Date"].str.split(",")
df= df.explode("Event Date")
The issue with this is it sets any cell without a ',' e.g. '2020-07-16 00:00:00' to NaN.
Is there any way to separate the values with a ',' and set the entire column to date types?
CodePudding user response:
Here is a proposition with pandas.Series.str.split
and pandas.Series.explode
:
s_dates = (
df["Event Date"]
.str.split(",")
.explode(ignore_index=True)
.apply(pd.to_datetime, dayfirst=True)
)
Output :
0 2020-07-16
1 2022-03-31
2 2018-11-26
3 2028-01-31
Name: Event Date, dtype: datetime64[ns]
CodePudding user response:
Your example table shows mixed date formats in each row. The idea is to try a date parsing technique and then try another if it fails. Using loops and having such wide variations of data types are red flags with a script design. I recommend using datetime and dateutil to handle the dates.
from datetime import datetime
from dateutil import parser
date_strings = ["2020-07-16 00:00:00", "31/03/2022, 26/11/2018, 31/01/2028"] % Get these from your table.
parsed_dates = []
for date_string in date_strings:
try:
# strptime
date_object = datetime.strptime(date_string, "%Y-%m-%d %H:%M:%S")
parsed_dates.append(date_object)
except ValueError:
# parser.parse() and split
date_strings = date_string.split(",")
for date_str in date_strings:
date_str = date_str.strip()
date_object = parser.parse(date_str, dayfirst=True)
parsed_dates.append(date_object)
print(parsed_dates)
Try the code on Trinket: https://trinket.io/python3/95c0d14271
CodePudding user response:
You can use combination of split and explode to separate dates and then use infer_datetime_format to convert mixed date types
df = df.assign(dates=df['dates'].str.split(',')).explode('dates')
df
Out[18]:
dates
0 2020-07-16 00:00:00
1 31/03/2022
1 26/11/2018
1 31/01/2028
df.dates = pd.to_datetime(df.dates, infer_datetime_format=True)
df.dates
Out[20]:
0 2020-07-16
1 2022-03-31
1 2018-11-26
1 2028-01-31
Name: dates, dtype: datetime64[ns]