Home > Blockchain >  how to parse excel file where columns are not properly mentioned in python
how to parse excel file where columns are not properly mentioned in python

Time:09-07

I am looking to find the values under a specific cell which are attached below. I have many excel files like this and find out the filename_pattern cell value (in FEED row). Here in this ex: its BBB_ALLOC.gz*

Can someone guide me how to find it using python or using any python libraries ?

I have tried following following code using openpyxl, but am not sure how to get the values once in iterate till FEED.

from openpyxl import load_workbook

file = "mysheet.xlsx"

wb_obj = load_workbook(filename=file)
wsheet = wb_obj['sheet1']
print('hello')
dataDict = {}

for key, *values in wsheet.iter_rows():
    dataDict[key.value] = [v.value for v in values]
print(dataDict)
for k, v in dataDict.items():
    if k == 'FEED':
        print(v)

enter image description here

CodePudding user response:

You are on right path with your code. You've got values of first column as keys in the dictionary dataDict. Once you find row with key == "FEED", iterate that row to find cell with value "filename_pattern". The cell, in same column but next row, is your desired filename value.

But you've collected the rows in a dictionary, which means you loose order of the rows and can't get next row. You should iterate it in a different manner.

from openpyxl import load_workbook

file = "admin_recomm_comparison.xlsx"

wb_obj = load_workbook(filename=file)
wsheet = wb_obj['sheet1']

filename = None
feed_row = None
for row in wsheet.iter_rows():
    if row[0].value == "FEED":
        feed_row = row
        break

filename_pattern_cell = None
if feed_row:
    for cell in feed_row:
        if cell.value == "filename_pattern":
            filename_pattern_cell = cell
            break

if filename_pattern_cell:
    cell_name = str(filename_pattern_cell.column_letter)   str(int(filename_pattern_cell.row)   1)
    print(wsheet[cell_name].value)
  • Related