Home > Blockchain >  How to search for a specific date within concatenated DataFrame TimeSeries. Same Date would repeat s
How to search for a specific date within concatenated DataFrame TimeSeries. Same Date would repeat s

Time:01-27

I downloaded historical price data for ^GSPC Share Market Index (S&P500), and several other Global Indices. Date is set as index.

Selecting values in rows when date is set to index works as expected with .loc.

# S&P500 DataFrame = spx_df
spx_df.loc['2010-01-04']

Open            1.116560e 03
High            1.133870e 03
Low             1.116560e 03
Close           1.132990e 03
Volume          3.991400e 09
Dividends       0.000000e 00
Stock Splits    0.000000e 00
Name: 2010-01-04 00:00:00-05:00, dtype: float64

I then concatenated several Stock Market Global Indices into a single DataFrame for further use. In effect, any date in range will be included five times when historical data for five Stock Indices are linked in a Time Series.

markets = pd.concat(ticker_list, axis = 0)

I want to reference a single date in concatenated df and set it as a variable. I would prefer if the said variable didn't represent a datetime object, because I would like to access it with .loc as part of def function. How does concatenate effect accessing rows via date as index if the same date repeats several times in a linked TimeSeries?

This is what I attempted so far:

# markets = concatenated DataFrame 
Reference_date = markets.loc['2010-01-04'] 
# KeyError: '2010-01-04'

Reference_date = markets.loc[markets.Date == '2010-01-04']
# This doesn't work because Date is not an attribute of the DataFrame

CodePudding user response:

To access a specific date in the concatenated DataFrame, you can use boolean indexing instead of .loc. This will return a DataFrame that contains all rows where the date equals the reference date:

reference_date = markets[markets.index == '2010-01-04']

You may also want to use query() method for searching for specific data

reference_date = markets.query('index == "2010-01-04"')

Keep in mind that the resulting variable reference_date is still a DataFrame and contains all rows that match the reference date across all the concatenated DataFrames. If you want to extract only specific columns, you can use the column name like this:

reference_date_Open = markets.query('index == "2010-01-04"')["Open"]

CodePudding user response:

Since you have set date as index you should be able to do: Reference_date = markets.loc[markets.index == '2010-01-04']

  • Related