python for loop to set headers-CodePudding

Hi I have a very large excel file that does not have consistent headers on each tab (ie: headers are found in row 18-23). I am trying to write a for loop that will iterate through each tab to set the headers correctly.

I have imported this with pd.read_excel and all data is in a dictionary with tab name = key, and tab data = dataframe. I got here but am not able to get the row number where the condition is true (that will then need to be set as the header).

df_GRID=pd.read_excel(input_GRID, sheet_name=df_GRID_tabs, header=0)

for key,values in df_GRID.items():
    print(key,values.iloc[:, 0] == '8760 Time Series')

CodePudding user response：

Maybe this is not that useful to you, but in my case, I use xls2csv.py from HERE. It does a lot of useful things like allow stripping empty lines, etc.

I make a pipeline of XLSX -> CSV -> Edited_CSV -> Processing...

To turn CSV -> Edited_CSV, I just loop through the rows looking for an indication of headers, something like this:

header_seen = False
outlines = []
with open(csvfile, 'r', newline='', encoding='utf-8') as infile:
  reader = csv.reader(infile, delimiter=',')
  for line in list(reader):
    if not header_seen:
      if line[0] == 'PATTERN':
        header_seen = True
        outlines.append(line)
    else:
      outlines.append(line)

CodePudding user response：

You could use np.where:

print(key, np.where(values.iloc[:,0] == '8760 Time Series')[0][0])