I'm trying to read an excel file using openpyxl and put it into a nested dictionary. However, the excel file does not have fixed column names.
For example it could be like this: with columns A,B,C or with columns A,C,D,E
Here's my code so far:
wb = openpyxl.load_workbook(path)
sheet = wb.active
max_col = sheet.max_column
col_list = []
for i in range (1, max_col 1):
cell_obj = sheet.cell(row = 1, column = i)
col_list.append(cell_obj.value)
#print (col_list)
Data = {}
sheet_data2 = {}
for row in sheet.iter_rows(min_row=2,values_only= True):
sheet_name = row[0]
for i in range(1, max_col):
temp = {
col_list[i]:row[i]
}
sheet_data2.update(temp)
Data[sheet_name] = sheet_data2
#print(Data)
My current output:
{1: {'A': 83, 'B': 1, 'C': 25}, 2: {'A': 83, 'B': 1, 'C': 25}, 3: {'A': 83, 'B': 1, 'C': 25}, 4: {'A': 83, 'B': 1, 'C': 25}}
Output i want:
{1: {'A': 30, 'B': 5, 'C': 2}, 2: {'A': 10, 'B': 9, 'C': 8}, 3: {'A': 65, 'B': 7, 'C': 42}, 4: {'A': 83, 'B': 1, 'C': 25}}
My nested dictionary keeps getting overwritten by the last row of values.
Also, if i want to extract a value (eg. A: 30) from my nested dictionary, how do i do so without knowing the column names?
CodePudding user response:
When you use Data[sheet_name] = sheet_data2
, it only stores the reference of the sheet_data2
dictionary in the Data
dictionary and not the values. So all of your Data[sheet_name]
are pointing to the same data, and when you change sheet_data2
in the next iteration of the loop, it overwrites the previous sheet data you had.
You can put sheet_data2
in the for loop, so it creates a new dictionary in each iteration:
Data = {}
for row in sheet.iter_rows(min_row=2,values_only= True):
sheet_data2 = {}
sheet_name = row[0]
for i in range(1, max_col):
temp = {
col_list[i]:row[i]
}
sheet_data2.update(temp)
Data[sheet_name] = sheet_data2