Hi I have a very large excel file that does not have consistent headers on each tab (ie: headers are found in row 18-23). I am trying to write a for loop that will iterate through each tab to set the headers correctly.
I have imported this with pd.read_excel and all data is in a dictionary with tab name = key, and tab data = dataframe. I got here but am not able to get the row number where the condition is true (that will then need to be set as the header).
df_GRID=pd.read_excel(input_GRID, sheet_name=df_GRID_tabs, header=0)
for key,values in df_GRID.items():
print(key,values.iloc[:, 0] == '8760 Time Series')
CodePudding user response:
Maybe this is not that useful to you, but in my case, I use xls2csv.py
from HERE. It does a lot of useful things like allow stripping empty lines, etc.
I make a pipeline of XLSX -> CSV -> Edited_CSV -> Processing...
To turn CSV -> Edited_CSV, I just loop through the rows looking for an indication of headers, something like this:
header_seen = False
outlines = []
with open(csvfile, 'r', newline='', encoding='utf-8') as infile:
reader = csv.reader(infile, delimiter=',')
for line in list(reader):
if not header_seen:
if line[0] == 'PATTERN':
header_seen = True
outlines.append(line)
else:
outlines.append(line)
CodePudding user response:
You could use np.where
:
print(key, np.where(values.iloc[:,0] == '8760 Time Series')[0][0])