I tried to convert a column of dates to datetime
using pd.to_datetime(df, format='%Y-%m-%d_%H-%M-%S')
but I received the error ValueError: unconverted data remains: .1
I ran:
data.loc[pd.to_datetime(data.date, format='%Y-%m-%d_%H-%M-%S', errors='coerce').isnull(), 'date']
to identify the problem. 119/1037808 dates in the date
column have an extra ".1" at the end of them. Other than the ".1", the dates are fine. How can I remove the ".1" from the end of those dates only and then convert the column values to datetime?
Here is an example dataframe that recreates the issue:
import pandas as pd
data = pd.DataFrame({"date" : ["2022-01-15_08-11-00.1","2022-01-15_08-11-30","2022-01-15_08-12-00.1", "2022-01-15_08-12-30"],
"value" : [1,2,3,4]})
I have tried:
data.date = data.date.replace(".1", "")
and
data = data.replace(".1", "")
but these did not remove the ".1". The final result should look like this:
data = pd.DataFrame({"date" : ["2022-01-15_08-11-00","2022-01-15_08-11-30","2022-01-15_08-12-00", "2022-01-15_08-12-30"],
"value" : [1,2,3,4]})
CodePudding user response:
You can use pandas.Series.replace
to get rid of the extra dot/number :
data["date"]= pd.to_datetime(data["date"].replace(r"\.\d ", "",
regex=True),
format="%Y-%m-%d_%H-%M-%S")
# Output :
print(data)
print(data.dtypes)
date value
0 2022-01-15 08:11:00 1
1 2022-01-15 08:11:30 2
2 2022-01-15 08:12:00 3
3 2022-01-15 08:12:30 4
date datetime64[ns]
value int64
dtype: object
If you don't want a datetime format, use just data["date"].replace(r"\.\d ", "", regex=True)
CodePudding user response:
data = data.assign(date=pd.to_datetime(data['date'].str.split('\.').str[0], format="%Y-%m-%d_%H-%M-%S"))