Avoid "single positional indexer is out-of-bounds" error when searching for values in Pand-CodePudding

what I have

I have multiple dataframes stored as CSVs. The dataframes have 2 columns : Col1, Col2

what I do

I read the CSVs files into separate dataframes, make list of the dataframes, and then search for the first occurrence of a value in "Col2" but only under condition that the value is there at least 10 consecutive rows. When it finds the value in "Col2" then I print corresponding value from the "Col1".

where is problem

When I did it with 5 consecutive rows my script worked, however when I try it with (at least) 10 rows, this error occurrs: IndexError: single positional indexer is out-of-bounds. It happens because some dataframes doesn't meet the criteria and the value in "Col2" is not here 10 consecutive rows.

how I want to solve it (I guess)

For the dataframes not meeting the criteria there would be the "Nan" printed instead of corresponding value from "Col1".

Code 1/2:

# current directory csv files
csvs = [x for x in os.listdir('.') if x.endswith('.csv')]
#read the csv files as separate dataframes
dfs = []
for file in csvs:
    df = pd.read_csv(file)
    dfs.append(df)

Code 2/2:

for dff1 in dfs:
    dff1[dff1['Col2'] == 0.000000].groupby((dff1['Col2'] != 0.000000).cumsum()).filter(lambda x: len(x) > 10)
    index = dff1[dff1['Col2'] == 0.000000].groupby((dff1['Col2'] != 0.000000).cumsum()).filter(lambda x: len(x) > 10).iloc[0][0] #this finds the corresponding value in Col1 if the Col2-value meets the criteria
    print(index) #this prints corresponding value from Col1

Then my code continues to make one dataframe of the printed values with assigned CSV names in a new column... Maybe there is some trick with the "iloc" or "filter"?

desired output:

11
9
Nan
Nan
6
...

Now it just stops after second dataframe and print only "11,9" and then raises error because thirs dataframe doesn't meet the criteria.

CodePudding user response：

Could be avoided with a simple if-statement

for dff1 in dfs:
    revised_df = dff1[dff1['Col2'] == 0.000000].groupby((dff1['Col2'] != 0.000000).cumsum()).filter(lambda x: len(x) > 10)
    if not revised_df.empty: 
        index = dff1[dff1['Col2'] == 0.000000].groupby((dff1['Col2'] != 0.000000).cumsum()).filter(lambda x: len(x) > 10).iloc[0][0] criteria
        print(index)
    else:
        print(np.nan)