Nested dictionary being overwritten by last value-CodePudding

I'm trying to read an excel file using openpyxl and put it into a nested dictionary. However, the excel file does not have fixed column names.

For example it could be like this: with columns A,B,C or with columns A,C,D,E

Here's my code so far:

wb = openpyxl.load_workbook(path)
sheet = wb.active
max_col = sheet.max_column
col_list = []

for i in range (1, max_col   1):
    cell_obj = sheet.cell(row = 1, column = i)
    col_list.append(cell_obj.value)
#print (col_list)


Data = {}
sheet_data2 = {}
for row in sheet.iter_rows(min_row=2,values_only= True):
    sheet_name = row[0]
    for i in range(1, max_col):
        temp = {
            col_list[i]:row[i]
        }
        sheet_data2.update(temp)
    Data[sheet_name] = sheet_data2

#print(Data)

My current output:

{1: {'A': 83, 'B': 1, 'C': 25}, 2: {'A': 83, 'B': 1, 'C': 25}, 3: {'A': 83, 'B': 1, 'C': 25}, 4: {'A': 83, 'B': 1, 'C': 25}}

Output i want:

{1: {'A': 30, 'B': 5, 'C': 2}, 2: {'A': 10, 'B': 9, 'C': 8}, 3: {'A': 65, 'B': 7, 'C': 42}, 4: {'A': 83, 'B': 1, 'C': 25}}

My nested dictionary keeps getting overwritten by the last row of values.

Also, if i want to extract a value (eg. A: 30) from my nested dictionary, how do i do so without knowing the column names?

CodePudding user response：

When you use Data[sheet_name] = sheet_data2, it only stores the reference of the sheet_data2 dictionary in the Data dictionary and not the values. So all of your Data[sheet_name] are pointing to the same data, and when you change sheet_data2 in the next iteration of the loop, it overwrites the previous sheet data you had.

You can put sheet_data2 in the for loop, so it creates a new dictionary in each iteration:

Data = {}
for row in sheet.iter_rows(min_row=2,values_only= True):
    sheet_data2 = {}
    sheet_name = row[0]
    for i in range(1, max_col):
        temp = {
            col_list[i]:row[i]
        }
        sheet_data2.update(temp)
    Data[sheet_name] = sheet_data2