Home > Software engineering >  Reindexing Pandas based on daterange
Reindexing Pandas based on daterange

Time:01-17

I am trying to reindex the dates in pandas. This is because there are dates which are missing, such as weekends or national hollidays.

To do this I am using the following code:

import pandas as pd
import yfinance as yf 
import datetime 

start = datetime.date(2015,1,1)
end = datetime.date.today()

df = yf.download('F', start, end, interval ='1d', progress = False)
df.index = df.index.strftime('%Y-%m-%d')
full_dates = pd.date_range(start, end)
df.reindex(full_dates)

This code is producing this dataframe:

    Open    High    Low     Close   Adj Close   Volume
2015-01-01  NaN     NaN     NaN     NaN     NaN     NaN
2015-01-02  NaN     NaN     NaN     NaN     NaN     NaN
2015-01-03  NaN     NaN     NaN     NaN     NaN     NaN
2015-01-04  NaN     NaN     NaN     NaN     NaN     NaN
2015-01-05  NaN     NaN     NaN     NaN     NaN     NaN
...     ...     ...     ...     ...     ...     ...
2023-01-13  NaN     NaN     NaN     NaN     NaN     NaN
2023-01-14  NaN     NaN     NaN     NaN     NaN     NaN
2023-01-15  NaN     NaN     NaN     NaN     NaN     NaN
2023-01-16  NaN     NaN     NaN     NaN     NaN     NaN
2023-01-17  NaN     NaN     NaN     NaN     NaN     NaN

Could you please advise why is it not reindexing the data and showing NaN values instead?

===Edit ===

Could it be a python version issue? I ran the same code in python 3.7 and 3.10

In python 3.7

enter image description here

In python 3.10

enter image description here

In python 3.10 - It is datetime as you can see from the image. enter image description here

Getting datetime after yf.download('F', start, end, interval ='1d', progress = False) without strftime

enter image description here

CodePudding user response:

Remove converting DatetimeIndex to strings by df.index = df.index.strftime('%Y-%m-%d'), so can reindex by datetimes.

df = yf.download('F', start, end, interval ='1d', progress = False)
full_dates = pd.date_range(start, end)
df = df.reindex(full_dates)

print (df)
             Open   High    Low  Close  Adj Close      Volume
2015-01-01    NaN    NaN    NaN    NaN        NaN         NaN
2015-01-02  15.59  15.65  15.18  15.36  10.830517  24777900.0
2015-01-03    NaN    NaN    NaN    NaN        NaN         NaN
2015-01-04    NaN    NaN    NaN    NaN        NaN         NaN
2015-01-05  15.12  15.13  14.69  14.76  10.407450  44079700.0
          ...    ...    ...    ...        ...         ...
2023-01-13  12.63  12.82  12.47  12.72  12.720000  96317800.0
2023-01-14    NaN    NaN    NaN    NaN        NaN         NaN
2023-01-15    NaN    NaN    NaN    NaN        NaN         NaN
2023-01-16    NaN    NaN    NaN    NaN        NaN         NaN
2023-01-17    NaN    NaN    NaN    NaN        NaN         NaN

[2939 rows x 6 columns]

print (df.index)
DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04',
               '2015-01-05', '2015-01-06', '2015-01-07', '2015-01-08',
               '2015-01-09', '2015-01-10',
               ...
               '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11',
               '2023-01-12', '2023-01-13', '2023-01-14', '2023-01-15',
               '2023-01-16', '2023-01-17'],
              dtype='datetime64[ns]', length=2939, freq='D')

EDIT: There is timezones difference, for remove it use DatetimeIndex.tz_convert:

df = yf.download('F', start, end, interval ='1d', progress = False)

df.index= df.index.tz_convert(None)

full_dates = pd.date_range(start, end)
df = df.reindex(full_dates)
print (df)
            

CodePudding user response:

You need to use strings in reindex to keep an homogeneous type, else pandas doesn't match the string (e.g., 2015-01-02) with the Timestamp (e.g., pd.Timestamp('2015-01-02')):

df.reindex(full_dates.astype(str))

#or
df.reindex(full_dates.strftime('%Y-%m-%d'))

Output:

             Open   High    Low  Close  Adj Close      Volume
2015-01-01    NaN    NaN    NaN    NaN        NaN         NaN
2015-01-02  15.59  15.65  15.18  15.36  10.830517  24777900.0
2015-01-03    NaN    NaN    NaN    NaN        NaN         NaN
2015-01-04    NaN    NaN    NaN    NaN        NaN         NaN
2015-01-05  15.12  15.13  14.69  14.76  10.407451  44079700.0
...           ...    ...    ...    ...        ...         ...
2023-01-13  12.63  12.82  12.47  12.72  12.720000  96317800.0
2023-01-14    NaN    NaN    NaN    NaN        NaN         NaN
2023-01-15    NaN    NaN    NaN    NaN        NaN         NaN
2023-01-16    NaN    NaN    NaN    NaN        NaN         NaN
2023-01-17    NaN    NaN    NaN    NaN        NaN         NaN

[2939 rows x 6 columns]
  • Related