I'm working on a project to analyze tweets and am first trying to convert the created_at column to datetimes.
format = "%Y-%m-%d %H:%M:%S"
df['created_at_datetime'] = pd.to_datetime(df['created_at'], format = format).dt.tz_localize(None)
I keep on getting the following error
I am in a very introductory and rudimentary class on analyzing Twitter so am not a coding expert at all. I've done homework assignments before where this line of code worked so am unsure as to what the error is now.
I am working in Colab and here is the full thing: https://colab.research.google.com/drive/1XXJsoMQouzH-1t7eWRd1c-fsrI3vYFcf?usp=sharing.
Thank you!
CodePudding user response:
try this :
format_y = "%Y-%m-%d %H:%M:%S"
pd.to_datetime(date, format = format_y).tz_localize(None)
CodePudding user response:
Check that all values in the 'created_at'
column are timestamps formatted as you expect.
It seems like some row could have the string "en"
instead of a timestamp.
CodePudding user response:
You need to find the culprit value that doesn't fit. Here's the workflow:
import pandas as pd
raw_dt_series = pd.Series(['2022-05-05', 'foobar','2022-05-02', '202', None])
raw_dt_series_notna = raw_dt_series.dropna()
dt_series = pd.to_datetime(raw_dt_series_notna, errors='coerce')
Output:
0 2022-05-05
1 NaT
2 2022-05-02
3 NaT **< - Treated as np.NaN in pandas**
dtype: datetime64[ns]
You found the rows that raised the Type error.
raw_dt_series_notna.loc[dt_series.isna()]
Time to investigate why the given values don't meet the format. After you've found out, adjust the format parameter:
pd.to_datetime(raw_dt_series, format='%YOUR%NEW%FORMAT)