Home > database >  How to change all data types to date for a column with multiple data types?
How to change all data types to date for a column with multiple data types?

Time:01-17

How can I change the data type of fields in a column to date type if the data types are as below:

<class 'datetime.datetime'>    296
<class 'str'>                  116
<class 'float'>                  8

My aim is ignore the empty rows, duplicate the rows with multiple dates and split them out, and convert the datetime values to dates.

A small section of the column to illustrate what the data looks like (Second row is empty for Event Date column):

Col1 Event Date
1 2020-07-16 00:00:00
2
3 31/03/2022, 26/11/2018, 31/01/2028

I've tried a number of things to get this to work but have had no luck. I tried looping through the rows to convert each row but looping isn't the best option. I tried to split and explode the cells with multiple dates as below but this errors (with dateutil.parser._parser.ParserError: Unknown string format: 31/03/2022, 26/11/2018, 31/01/2028 present at position 3).

df=auto_test_file.assign(dates=auto_test_file['Event Date'].str.split(',')).explode('dates')
pd.to_datetime(df['Event Date'])

CodePudding user response:

You could you explode

df=df.assign(dates=df['Event Date'].str.split(',')).explode('dates')
df
Out[93]: 
   Col1                          Event Date                dates
0     1                 2020-07-16 00:00:00  2020-07-16 00:00:00
1     2                                 NaN                  NaN
2     3  31/03/2022, 26/11/2018, 31/01/2028           31/03/2022
2     3  31/03/2022, 26/11/2018, 31/01/2028           26/11/2018
2     3  31/03/2022, 26/11/2018, 31/01/2028           31/01/2028

then convert to datetime

pd.to_datetime(df.dates)
Out[94]: 
0   2020-07-16
1          NaT
2   2022-03-31
2   2018-11-26
2   2028-01-31
Name: dates, dtype: datetime64[ns]

CodePudding user response:

Suggested code

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Col1': [1, 2, 3],
    'Event': ['2020-07-16 00:00:00','' , '31/03/2022, 26/11/2018, 31/01/2028'],
})

#    Col1                               Event
# 0     1                 2020-07-16 00:00:00
# 1     2                                    
# 2     3  31/03/2022, 26/11/2018, 31/01/2028

# 1 - Split inside each Event row 
df['Event'] = df['Event'].apply(lambda r:r.split(','))
# 2 - Explode and the reindex
df = df.explode(column='Event').reset_index(drop=True)
# 3- Replace '' by NAN
df.replace(to_replace='', value= np.nan, inplace=True)
# 4 - Suppress rows with NAN
df.dropna(inplace=True)
# 5 - Convert to date
df['Event'] = pd.to_datetime(df['Event']).dt.date

Output

   Col1       Event
0     1  2020-07-16
2     3  2022-03-31
3     3  2018-11-26
4     3  2028-01-31
  • Related