I have tried this code to convert one of my columns,search_departure_date, from the dataframe df into the datetimeformat to get the following error.
df
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 182005 entries, 0 to 182004
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 182005 non-null datetime64[ns]
1 device_type 182005 non-null object
2 search_origin 182005 non-null object
3 search_destination 182005 non-null object
4 search_route 182005 non-null object
5 search_adult_count 157378 non-null float64
6 search_child_count 157378 non-null float64
7 search_cabin_class 157378 non-null object
8 search_type 182005 non-null object
9 search_departure_date 182005 non-null object
10 search_arrival_date 97386 non-null datetime64[ns]
df["search_departure_date"] = pd.to_datetime(df.loc[:, 'search_departure_date'], format='%Y-%m-%d')
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1478-06-14 17:17:56
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1479-03-23 17:17:56
so I am trying to filter out the rows with this timestamp value
df.loc[df['search_departure_date'] != '1478-06-14 17:17:56']
df.loc[df['search_departure_date'] != '1479-03-23 17:17:56']
df.loc[df['search_departure_date'] != '1478-06-14 17:17:56']
how can I do it for multiple timestamps? I noticed they all begin with 1478 or 1479, using just the pipe (|) operator to join lots of them together is cumbersome.
CodePudding user response:
You can try errors='coerce'
df["search_departure_date"] = pd.to_datetime(df['search_departure_date'], errors='coerce')
If you want to filter out the rows, you can use str.match
m = df['search_departure_date'].str.match('147(8|9)')