Python - Pandas - How to subscript / slice / select a datetime index dataframe with date string (str-CodePudding

I have a dataframe with a datetime index that looks as follows


Datetimeindex price1 ($/kWh) price2 ($/kWh) price3 ($/kWh)  Price4 ($/kWh)  price 5 
                    
2020-01-01  406.73  551.24  627.21  627.21  406.73
2020-01-02  397.90  551.24  627.21  627.21  397.90
2020-01-03  285.31  551.24  627.21  627.21  285.31
2020-01-04  255.36  551.24  627.21  627.21  255.36
2020-01-05  305.12  551.24  627.21  627.21  305.12
...     ...     ...     ...     ...     ...
2021-09-30  141.72  665.00  810.47  810.47  141.72
2021-10-01  123.73  691.41  841.26  841.26  123.73
2021-10-02  122.95  691.41  841.26  841.26  122.95
2021-10-03  113.81  691.41  841.26  841.26  113.81

I wanted to subscript the dataframe as

df['2020-01-01']

and got error :

KeyError: '2020-01-01'

if a parse to datetime the string then the error is

KeyError: datetime.datetime(2020, 1, 1, 0, 0)

I searched in the web and took the conclusion that the problem is datetime index type so i need to parse it to timestamp native pandas Date format but I can't accomplish it. Is there anyway to parse directly the index to timestamp?

EDIT:

this is the type of index

<bound method DatetimeIndex.to_series of DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10',
               ...
               '2021-09-25', '2021-09-26', '2021-09-27', '2021-09-28',
               '2021-09-29', '2021-09-30', '2021-10-01', '2021-10-02',
               '2021-10-03', '2021-10-04'],
              dtype='datetime64[ns]', name='Fecha', length=643, freq=None)>

CodePudding user response：

Note that subscripting date string in format df[string] has been deprecated since Pandas 1.2.0:

Deprecated indexing DataFrame rows with a single datetime-like string as df[string] (given the ambiguity whether it is indexing the rows or selecting a column), use df.loc[string] instead (GH36179)

See also GH36179 for details.

To subscript the DatetimeIndex with date string, now we have to use .loc, as follows:

df.loc['2020-01-01']

Result:

price1 ($/kWh)    406.73
price2 ($/kWh)    551.24
price3 ($/kWh)    627.21
Price4 ($/kWh)    627.21
price 5           406.73
Name: 2020-01-01 00:00:00, dtype: float64

You can continue to subscript by partial date string to specify a date range, e.g.

df.loc['2020-01']      # for date range within a month

Result:

            price1 ($/kWh)  price2 ($/kWh)  price3 ($/kWh)  Price4 ($/kWh)  price 5
2020-01-01          406.73          551.24          627.21          627.21   406.73
2020-01-02          397.90          551.24          627.21          627.21   397.90
2020-01-03          285.31          551.24          627.21          627.21   285.31
2020-01-04          255.36          551.24          627.21          627.21   255.36
2020-01-05          305.12          551.24          627.21          627.21   305.12

Or use a range of months:

df.loc['2021-09':'2021-10']

Result:

            price1 ($/kWh)  price2 ($/kWh)  price3 ($/kWh)  Price4 ($/kWh)  price 5
2021-09-30          141.72          665.00          810.47          810.47   141.72
2021-10-01          123.73          691.41          841.26          841.26   123.73
2021-10-02          122.95          691.41          841.26          841.26   122.95
2021-10-03          113.81          691.41          841.26          841.26   113.81

You can also just specify year:

df.loc['2021']

Result:

            price1 ($/kWh)  price2 ($/kWh)  price3 ($/kWh)  Price4 ($/kWh)  price 5
2021-09-30          141.72          665.00          810.47          810.47   141.72
2021-10-01          123.73          691.41          841.26          841.26   123.73
2021-10-02          122.95          691.41          841.26          841.26   122.95
2021-10-03          113.81          691.41          841.26          841.26   113.81

Also supports date strings in various formats:

df.loc['20200102': '20200104']          # YYYYmmdd

Result:

            price1 ($/kWh)  price2 ($/kWh)  price3 ($/kWh)  Price4 ($/kWh)  price 5
2020-01-02          397.90          551.24          627.21          627.21   397.90
2020-01-03          285.31          551.24          627.21          627.21   285.31
2020-01-04          255.36          551.24          627.21          627.21   255.36

df.loc['09/30/2021': '10/02/2021']      # mm/dd/YYYY

Result:

            price1 ($/kWh)  price2 ($/kWh)  price3 ($/kWh)  Price4 ($/kWh)  price 5
2021-09-30          141.72          665.00          810.47          810.47   141.72
2021-10-01          123.73          691.41          841.26          841.26   123.73
2021-10-02          122.95          691.41          841.26          841.26   122.95

CodePudding user response：

You are trying to catch a column with df['2020-01-01'], if you want to filter your index use df.index.filter, or df[df.index.isin(date…)] for entire df