I am using the following code to detect a row number (the header) in multiple CSV files which I am processing:
combined_xlsx = pd.read_excel(xlsxfile, nrows=20)
out = np.where([combined_xlsx.values == 'PCA'])[1][0]
combined_xlsx = pd.read_excel(xlsxfile, header=out 1)
combined_xlsx.dropna(subset=['PCA'], inplace=True)
Based on the value 'PCA' occurring, the header row is decided and stored and used to read the whole file. I cannot use a fixed number with the header= method because the header row occurs in various rows in the original files.
In case the header row is at row position 0, the code doesn't work and I receive the following error:
IndexError: index 0 is out of bounds for axis 0 with size 0
How can I solve this issue and correctly determine the header row whether at row position 0 or not?
CodePudding user response:
Your usage of np.where
is wrong. It's function signature is numpy.where(condition, x, y)
where condition
a boolean list. For True
value, it yields x, otherwise yield y.
CodePudding user response:
Fixed with the following function:
def headerfinder(df, mystr):
cols = df.columns.isin([mystr])
if True in cols:
out = 0
else:
out = np.where([df.values == mystr])[1][0] 1
return(out)
Not a neat solution, but works in my case.