I have tried this code to convert one of my columns,search_departure_date, from the dataframe df into the datetimeformat to get the following error.


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 182005 entries, 0 to 182004
Data columns (total 19 columns):
 #   Column                      Non-Null Count   Dtype         
---  ------                      --------------   -----         
 0   date                        182005 non-null  datetime64[ns]
 1   device_type                 182005 non-null  object        
 2   search_origin               182005 non-null  object        
 3   search_destination          182005 non-null  object        
 4   search_route                182005 non-null  object        
 5   search_adult_count          157378 non-null  float64       
 6   search_child_count          157378 non-null  float64       
 7   search_cabin_class          157378 non-null  object        
 8   search_type                 182005 non-null  object        
 9   search_departure_date       182005 non-null  object        
 10  search_arrival_date         97386 non-null   datetime64[ns]

df["search_departure_date"] = pd.to_datetime(df.loc[:, 'search_departure_date'], format='%Y-%m-%d')

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1478-06-14 17:17:56

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1479-03-23 17:17:56

so I am trying to filter out the rows with this timestamp value

df.loc[df['search_departure_date'] != '1478-06-14 17:17:56']

df.loc[df['search_departure_date'] != '1479-03-23 17:17:56']

how can I do it for multiple timestamps? I noticed they all begin with 1478 or 1479, using just the pipe (|) operator to join lots of them together is cumbersome.

CodePudding user response:

You can try errors='coerce'

df["search_departure_date"] = pd.to_datetime(df['search_departure_date'], errors='coerce')

If you want to filter out the rows, you can use str.match

m = df['search_departure_date'].str.match('147(8|9)')
