Home > Software design >  Pandas date column: problem with date conversion
Pandas date column: problem with date conversion

Time:11-30

I have a column date in a Covid data set. The dates appear in this format 20211030 (year - month - day).

However, when converting that column, everything appears with 1970.

This is my code:

df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')

The result is this:

0   MI PERU 1970-08-22 21:58:27
1   SAN JUAN DE LURIGANCHO  1970-08-22 19:27:09
2   YANAHUARA   1970-08-22 19:22:01
3   CUSCO   1970-08-22 22:08:41
4   PANGOA  1970-08-22 21:58:36

Thank you in advance for your help, big hug.

CodePudding user response:

In your case, you don't need pd.to_datetime IF column contains strings:

df = pd.DataFrame({'FECHA': ['20211030']})
print(df)

# Output:
      FECHA
0  20211030

Use astype:

df['FECHA'] = df['FECHA'].astype('datetime64')
print(df)

# Output:
       FECHA
0 2021-10-30

BUT if the dtype of your column FECHA is integer, you have to cast your column to string before:

df['FECHA'] = df['FECHA'].astype(str).astype('datetime64')
print(df)

# Output:
       FECHA
0 2021-10-30

CodePudding user response:

As noted in the comments, the result is caused by the parameters you are inputing in the to_datetime function. To fix this you should :

  1. drop the unit parameter which is not related to your formating
  2. add a format parameter which correspond to the date format you are using.

Hence, your code should go from:

df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')

To this:

df["FECHA"] = pd.to_datetime(df["FECHA"], format='%Y%m%d')

In order to find the proper formating you can lookup the values that correspond within this documentation. Docs related to the to_datetime function can be found here.


In our scenario the %Y corresponds to a year with century as a decimal number. The %m to a padded month (with a starting zero). And the %d to the day in the month. This should match the 20211030 (year - month - day) given.

  • Related