Home > Software design >  Parse date in pandas with unit='D'
Parse date in pandas with unit='D'

Time:12-04

What is wrong with this?

pd.to_datetime('2022-01-01',unit='D')

If I do it without the unit

pd.to_datetime('2022-01-01')

no error is raised. However, insted of the standard unit ns I rather want D.

CodePudding user response:

The issue is that the unit parameter in the pandas.to_datetime() function specifies the unit of the input date/time data, not the output format.

To specify the output format of the resulting datetime object, you can use the format parameter instead.

dt = pd.to_datetime('2022-01-01', format='%Y-%m-%d')

CodePudding user response:

There is a quite clear description and examples on the enter image description here

So, feels legit, does not it?
Let's try it on some different unit, e.g. s which stands for seconds:

pd.to_datetime([1, 2, 3], unit='D',
               origin=pd.Timestamp('1960-01-01'))
Output:
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None)

What has happened here? Basically we are taking origin as the base date, and this list in the beginning as a… multiplier? By unit='D' we set it to days, no problem, let's see how it behaves on a different list:

pd.to_datetime([0, 30, 64], unit='s',
               origin=pd.Timestamp('1960-01-01'))
Output:
DatetimeIndex(['1960-01-01 00:00:00', '1960-01-01 00:00:30',
               '1960-01-01 00:01:04'],
              dtype='datetime64[ns]', freq=None)

That was expected. Basically same thing, we are rather taking the base value, or add 30 seconds or get 00:01:04 by adding 64 seconds

To sum it up

You are misusing this unit= key, it's meant to add up to the base datetime by providing a list of values of how much you want to add up. Your date should be featured in origin= key as origin='2022-01-01'.

If you don't want this functionality and you want to cast this value to a day, than look at the other answer. Basically:

pd.to_datetime('2022-01-01', format='%Y-%m-%d').day
Output:
1

One is the first day of Jan 20222.

CodePudding user response:

The error in the code is that the unit parameter of the pd.to_datetime() function expects a string representing the time unit, but you have passed it the integer value 'D' instead. In this case, the function will try to interpret the integer value as a string and will raise a TypeError because it cannot convert the integer to a valid time unit.

To fix this error, you need to pass the unit parameter a string value instead of an integer. For example, you could use the following code to specify the D time unit:

pd.to_datetime('2022-01-01',unit='D')

Or you could use the 'days' string to specify the same time unit:

pd.to_datetime('2022-01-01',unit='days')

In either case, the pd.to_datetime() function will correctly interpret the time unit and convert the date string to a datetime object. It is worth noting that the default time unit for the pd.to_datetime() function is 'ns', which stands for nanoseconds, so if you do not specify a time unit, the function will use this default value.

  • Related