I find that pandas.Timestamp is extremely powerful and flexible parsing tool that accepts a wide range of timestamp/datetime formats. E.g.
In [38]: pd.Timestamp('2020')
Out[38]: Timestamp('2020-01-01 00:00:00')
In [39]: pd.Timestamp('2020-02')
Out[39]: Timestamp('2020-02-01 00:00:00')
In [40]: pd.Timestamp('2020Q1')
Out[40]: Timestamp('2020-01-01 00:00:00')
But it doesn't always do the "magic" I was expecting, e.g. the followings are illegal:
In [41]: pd.Timestamp('202003') # expecting 2020-03-01
ValueError: could not convert string to Timestamp
In [42]: pd.Timestamp('2020H2') # expecting 2020-07-01, i.e. 2020 second half (start)
ValueError: could not convert string to Timestamp
I tried to find a complete list of supported formats but it seems that the document is missing (or I'm missing something). Can anyone help? Thanks!
CodePudding user response:
pd.timestamp()
is the pandas equivalent of python’s Datetime and is interchangeable with it in most cases. Datetime library accepts ISO 8601 date formats.
In Python ISO 8601 date is represented in YYYY-MM-DDTHH:MM:SS.mmmmmm
format. For example, May 18, 2022, is represented as 2022-05-18T11:40:22.519222
.
YYYY
: Year in four-digit formatMM
: Months from 1-12 DD: Days from 1 to 31T
: It is the separator character that is to be printed between the date and time fields. It is an optional parameter having a default value of “T”.HH
: For the value of minutesMM
: For the specified value of minutesSS
: For the specified value of secondsmmmmmm
: For the specified microseconds
Directly from the Pandas documentation (here):
There are essentially three calling conventions for the constructor. The primary form accepts four parameters. They can be passed by position or keyword.
The other two forms mimic the parameters from datetime.datetime. They can be passed by either position or keyword, but not both mixed together.
Examples
Using the primary calling convention: This converts a datetime-like string
>>> pd.Timestamp('2017-01-01T12')
Timestamp('2017-01-01 12:00:00')
This converts a float representing a Unix epoch in units of seconds
>>> pd.Timestamp(1513393355.5, unit='s')
Timestamp('2017-12-16 03:02:35.500000')
This converts an int representing a Unix-epoch in units of seconds and for a particular timezone
>>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific')
Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific')
Using the other two forms that mimic the API for datetime.datetime:
>>> pd.Timestamp(2017, 1, 1, 12)
Timestamp('2017-01-01 12:00:00')
>>> pd.Timestamp(year=2017, month=1, day=1, hour=12)
Timestamp('2017-01-01 12:00:00')
CodePudding user response:
Use quarters if looking into annual periods
df=pd.DataFrame({'date':['2020-Q1']})
pd.PeriodIndex(df['date'], freq='Q').to_timestamp()
and for dates
import pandas as pd
pd.to_datetime('202003', format='%Y%m')