Home > OS >  How can I index a Date column that doesn't reference itself by index value or column name?
How can I index a Date column that doesn't reference itself by index value or column name?

Time:01-08

I'm trying to index date values from the date column in a panda dataframe.

                                 Open        High         Low       Close   Volume  Dividends  Stock Splits
Date
2013-01-07 00:00:00-05:00   71.951356   72.440875   71.877924   72.285858  1859400        0.0           0.0
2013-01-08 00:00:00-05:00   72.090030   72.514278   71.151782   71.535240  2173700        0.0           0.0
2013-01-09 00:00:00-05:00   71.926850   72.848777   71.779993   72.661125  2569200        0.0           0.0
2013-01-10 00:00:00-05:00   72.971141   73.370914   72.448986   73.354599  2364800        0.0           0.0
2013-01-11 00:00:00-05:00   73.313856   73.950232   73.036458   73.118050  3284900        0.0           0.0
...                               ...         ...         ...         ...      ...        ...           ...
2022-12-30 00:00:00-05:00  426.880005  429.739990  424.820007  428.760010   759900        0.0           0.0
2023-01-03 00:00:00-05:00  428.000000  428.160004  420.200012  424.290009  1166300        0.0           0.0
2023-01-04 00:00:00-05:00  423.929993  426.359985  420.209991  423.480011  1050800        0.0           0.0
2023-01-05 00:00:00-05:00  422.570007  423.489990  412.609985  414.730011  1485000        0.0           0.0
2023-01-06 00:00:00-05:00  416.200012  428.540009  416.010010  426.470001  1042200        0.0           0.0

When I look up the column names, the first column 'Date' doesn't exist, so am unable to reference it when setting an index.

>>> for col in stock_df.columns:
...     print(col)
...
Open
High
Low
Close
Volume
Dividends
Stock Splits

When I reference index[0] in my data frame, it shows the first column being 'Open'.

>>> stock_current_price = stock_df["Open"].loc[stock_df.index[0]]
>>> print(stock_current_price)
71.95135572116979

If I reference iloc[:, 0] to my dataframe (or any integer), I get the 'Open' column value with the date included. But I can't separate the date column to be used as an index.

>>> test = stock_df.iloc[:, 0]
>>> print(test)
Date
2013-01-07 00:00:00-05:00     71.951356
2013-01-08 00:00:00-05:00     72.090030
2013-01-09 00:00:00-05:00     71.926850
2013-01-10 00:00:00-05:00     72.971141
2013-01-11 00:00:00-05:00     73.313856
                            ...
2022-12-30 00:00:00-05:00    426.880005
2023-01-03 00:00:00-05:00    428.000000
2023-01-04 00:00:00-05:00    423.929993
2023-01-05 00:00:00-05:00    422.570007
2023-01-06 00:00:00-05:00    416.200012

This includes any index integer referenced, so other column iloc's will also have the date shown.

>>> test = stock_df.iloc[:, 4]
>>> print(test)
Date
2013-01-07 00:00:00-05:00    1859400
2013-01-08 00:00:00-05:00    2173700
2013-01-09 00:00:00-05:00    2569200
2013-01-10 00:00:00-05:00    2364800
2013-01-11 00:00:00-05:00    3284900
                          ...
2022-12-30 00:00:00-05:00     759900
2023-01-03 00:00:00-05:00    1166300
2023-01-04 00:00:00-05:00    1050800
2023-01-05 00:00:00-05:00    1485000
2023-01-06 00:00:00-05:00    1042200

I would like to take a large dataframe and specify a column value (Open) for example based on a specific "%Y-%m-%d" date. I'm worn out from coding tonight and hoping that a second set of eyes can take a look and help point me in the right direction.

CodePudding user response:

A DataFrame is surrounded by an index and columns; that's its primary difference from, say, a NumPy array. The "Date" values are in the index part, not in its own column, so, e.g., df["Date"] will fail and df.iloc[:, N] wouldn't find it for any N; because it's df.index. If you must, you can move it to become a column with df.reset_index().

However! Index is a crucial part of a DataFrame (or a Series). Almost all operations care about it and you should too :) Arguably, the "Date" is better kept as an index in your case, because

specify a column value (Open) for example based on a specific "%Y-%m-%d" date

is conveniently doable as it stands with

df.loc["1997-04-27", "Open"]

as opposed to

df.loc[df.Date.eq("1997-04-27"), "Open"]

if it ("Date") was a column instead. And it turns out it only gets worse if you want to perform more datetime-specific operations, e.g., resampling, subsetting etc.

(Noting that in df.loc, part before "," addresses index, and after "," addresses columns.)

  • Related