I have a column date
in a Covid data set. The dates appear in this format 20211030
(year - month - day).
However, when converting that column, everything appears with 1970.
This is my code:
df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')
The result is this:
0 MI PERU 1970-08-22 21:58:27
1 SAN JUAN DE LURIGANCHO 1970-08-22 19:27:09
2 YANAHUARA 1970-08-22 19:22:01
3 CUSCO 1970-08-22 22:08:41
4 PANGOA 1970-08-22 21:58:36
Thank you in advance for your help, big hug.
CodePudding user response:
In your case, you don't need pd.to_datetime
IF column contains strings:
df = pd.DataFrame({'FECHA': ['20211030']})
print(df)
# Output:
FECHA
0 20211030
Use astype
:
df['FECHA'] = df['FECHA'].astype('datetime64')
print(df)
# Output:
FECHA
0 2021-10-30
BUT if the dtype of your column FECHA
is integer, you have to cast your column to string before:
df['FECHA'] = df['FECHA'].astype(str).astype('datetime64')
print(df)
# Output:
FECHA
0 2021-10-30
CodePudding user response:
As noted in the comments, the result is caused by the parameters you are inputing in the to_datetime
function. To fix this you should :
- drop the
unit
parameter which is not related to your formating - add a format parameter which correspond to the date format you are using.
Hence, your code should go from:
df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')
To this:
df["FECHA"] = pd.to_datetime(df["FECHA"], format='%Y%m%d')
In order to find the proper formating you can lookup the values that correspond within this documentation. Docs related to the to_datetime function can be found here.
In our scenario the %Y
corresponds to a year with century as a decimal number.
The %m
to a padded month (with a starting zero). And the %d
to the day in the month. This should match the 20211030
(year - month - day) given.