Rolling window and problem with slice indexing-CodePudding

I use this Python code to calculate CQ statistic for each year in my dataset:

import pandas as pd
import numpy as np
import time
from CrossQuantilogram import Bootstrap
import CrossQuantilogram
from CrossQuantilogram import LjungBoxQ

d1=pd.read_csv(r"...\sgold.csv")
d2=pd.read_csv(r"...\cgold.csv")

def CQBS_years(d1,a1,d2,a2,k=1,window=1,cqcl=0.95,testf=LjungBoxQ,testcl=0.95,
                all=False,n=1000,verbose=True):     
       
    startyear,endyear = 2010, 2019
    if window>1 endyear-startyear:
        raise ValueError("length of window must <= data range")

    cqres,yearlist=[],[(str(x),str(x window-1)) for x in range(startyear,endyear-window 2)]    
    for start,end in yearlist:
        if verbose:
            print("Processing {}/{}   ".format(end,endyear),end='\r')
        cqres.append(CQBS(data1[start:end],a1,data2[start:end],a2,k,cqcl,testf,testcl,n,False))

    res,yearindex=[],[str(x) for x in range(startyear window-1,endyear 1)]
    if all:
        for i in [[df.iloc[x] for df in cqres] for x in range(k)]:
            merged = pd.concat(i,ignore_index=True)
            merged.index = yearindex
            res.append(merged)        
    else:
        res=pd.concat(cqres,ignore_index=True)
        res.index = yearindex
    if verbose:
        print("Bootstraping CQ done      ")
    return res

%%time
CrossQuantilogram.CQBS_years(d1["day"],0.1,d2["day"],0.1,k=1,window=1,cqcl=0.95,testcl=0.95,all=False,n=1000,verbose=True)

While estimating the CQBS_years function, I get this error: "cannot do slice indexing on RangeIndex with these indexers [2010] of type str". I know this is related to the string type of date in my CSV files. But I don't know how to solve it.

The dataset is available at this link: https://drive.google.com/drive/folders/1PXyXP3AK8_KYxRYfZWO3VHzPPueG3FEF?usp=sharing Here is the source of the code: https://github.com/wangys96/Cross-Quantilogram Any help is greatly appreciated.

CodePudding user response：

The problem is that selecting data like this in pandas - df['2010':'2010'], requires the dataframe df to be indexed by dates.

Thus, you need to read the data, parse the column with dates as datetime, and set the index to that column. This can be achieved in one step:

d1=pd.read_csv(r"...\sgold.csv", parse_dates=[0], index_col=[0])
d2=pd.read_csv(r"...\cgold.cs", parse_dates=[0], index_col=[0])