I have a dataframe with a datetime index that looks as follows
Datetimeindex price1 ($/kWh) price2 ($/kWh) price3 ($/kWh) Price4 ($/kWh) price 5
2020-01-01 406.73 551.24 627.21 627.21 406.73
2020-01-02 397.90 551.24 627.21 627.21 397.90
2020-01-03 285.31 551.24 627.21 627.21 285.31
2020-01-04 255.36 551.24 627.21 627.21 255.36
2020-01-05 305.12 551.24 627.21 627.21 305.12
... ... ... ... ... ...
2021-09-30 141.72 665.00 810.47 810.47 141.72
2021-10-01 123.73 691.41 841.26 841.26 123.73
2021-10-02 122.95 691.41 841.26 841.26 122.95
2021-10-03 113.81 691.41 841.26 841.26 113.81
I wanted to subscript the dataframe as
df['2020-01-01']
and got error :
KeyError: '2020-01-01'
if a parse to datetime the string then the error is
KeyError: datetime.datetime(2020, 1, 1, 0, 0)
I searched in the web and took the conclusion that the problem is datetime index type so i need to parse it to timestamp native pandas Date format but I can't accomplish it. Is there anyway to parse directly the index to timestamp?
EDIT:
this is the type of index
<bound method DatetimeIndex.to_series of DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
'2020-01-09', '2020-01-10',
...
'2021-09-25', '2021-09-26', '2021-09-27', '2021-09-28',
'2021-09-29', '2021-09-30', '2021-10-01', '2021-10-02',
'2021-10-03', '2021-10-04'],
dtype='datetime64[ns]', name='Fecha', length=643, freq=None)>
CodePudding user response:
Note that subscripting date string in format df[string]
has been deprecated since Pandas 1.2.0:
Deprecated indexing DataFrame rows with a single datetime-like string as df[string] (given the ambiguity whether it is indexing the rows or selecting a column), use df.loc[string] instead (GH36179)
See also GH36179 for details.
To subscript the DatetimeIndex with date string, now we have to use .loc
, as follows:
df.loc['2020-01-01']
Result:
price1 ($/kWh) 406.73
price2 ($/kWh) 551.24
price3 ($/kWh) 627.21
Price4 ($/kWh) 627.21
price 5 406.73
Name: 2020-01-01 00:00:00, dtype: float64
You can continue to subscript by partial date string to specify a date range, e.g.
df.loc['2020-01'] # for date range within a month
Result:
price1 ($/kWh) price2 ($/kWh) price3 ($/kWh) Price4 ($/kWh) price 5
2020-01-01 406.73 551.24 627.21 627.21 406.73
2020-01-02 397.90 551.24 627.21 627.21 397.90
2020-01-03 285.31 551.24 627.21 627.21 285.31
2020-01-04 255.36 551.24 627.21 627.21 255.36
2020-01-05 305.12 551.24 627.21 627.21 305.12
Or use a range of months:
df.loc['2021-09':'2021-10']
Result:
price1 ($/kWh) price2 ($/kWh) price3 ($/kWh) Price4 ($/kWh) price 5
2021-09-30 141.72 665.00 810.47 810.47 141.72
2021-10-01 123.73 691.41 841.26 841.26 123.73
2021-10-02 122.95 691.41 841.26 841.26 122.95
2021-10-03 113.81 691.41 841.26 841.26 113.81
You can also just specify year:
df.loc['2021']
Result:
price1 ($/kWh) price2 ($/kWh) price3 ($/kWh) Price4 ($/kWh) price 5
2021-09-30 141.72 665.00 810.47 810.47 141.72
2021-10-01 123.73 691.41 841.26 841.26 123.73
2021-10-02 122.95 691.41 841.26 841.26 122.95
2021-10-03 113.81 691.41 841.26 841.26 113.81
Also supports date strings in various formats:
df.loc['20200102': '20200104'] # YYYYmmdd
Result:
price1 ($/kWh) price2 ($/kWh) price3 ($/kWh) Price4 ($/kWh) price 5
2020-01-02 397.90 551.24 627.21 627.21 397.90
2020-01-03 285.31 551.24 627.21 627.21 285.31
2020-01-04 255.36 551.24 627.21 627.21 255.36
df.loc['09/30/2021': '10/02/2021'] # mm/dd/YYYY
Result:
price1 ($/kWh) price2 ($/kWh) price3 ($/kWh) Price4 ($/kWh) price 5
2021-09-30 141.72 665.00 810.47 810.47 141.72
2021-10-01 123.73 691.41 841.26 841.26 123.73
2021-10-02 122.95 691.41 841.26 841.26 122.95
CodePudding user response:
You are trying to catch a column with df['2020-01-01'], if you want to filter your index use df.index.filter, or df[df.index.isin(date…)] for entire df