I have an excel table with a format:
I want to convert this table to a JSON format which looks like this:
data = {'Program': {1: {'Name': 'John', 'Program': 'BS', 'Age': 29}, 2: {'Name': 'Doe', 'Program': 'MS', 'Age': 35}}, 'Locations': {'New York': {1: 78, 2: 80, 3: 36, 4: 44}, 'Chicago': {1: 68, 2: 53, 3: 87, 4: 130}, 'Houston': {1: 57, 2: 89, 3: 64, 4: 77}, 'Alabama': {1: 98, 2: 124, 3: 73, 4: 82}}, 'name_ratings': {'John': 0.2, 'Doe': 0.7, 'Jessica': 0.4, 'Alley': 0.9}}
I am using openpyxl to load the excel file in Python and iterating over rows.
for col in sheet.iter_rows(min_row=1, min_col=1, max_row=5, max_col=8):
for cell in col:
print(cell.value)
Can anyone please help me with this?
File attached: sample excel file
Thanks.
CodePudding user response:
First, sheet.iter_rows
gives you rows, not columns. If you don't care about the column names as given in the sheet, you may want pass in min_row=2
.
for row in sheet.iter_rows(min_row=2, min_col=1, max_row=5, max_col=8):
print([cell.value for cell in row])
['John', 'BS', 29, 0.2, 78, 68, 57, 98]
['Doe', 'MS', 35, 0.7, 80, 53, 89, 124]
['Jessica', 'MS', 26, 0.4, 36, 87, 64, 73]
['Alley', 'BS', 33, 0.9, 44, 130, 77, 82]
Then you can do your aggregations
name_ratings = {}
programs = {} # fill all this in
...
for row in sheet.iter_rows(min_row=2, min_col=1, max_row=5, max_col=8):
name_ratings[row[0].value] = row[3].value
...
{'John': 0.2, 'Doe': 0.7, 'Jessica': 0.4, 'Alley': 0.9}
...
result = json.dumps({"name_ratings": name_ratings, "programs": programs}) # and the other values
For easier access, you could also use a different library like pandas to load the data
import pandas
pandas.read_excel(path)
Name Program Age Rating New York Chicago Houston Alabama
0 John BS 29 0.2 78 68 57 98
1 Doe MS 35 0.7 80 53 89 124
2 Jessica MS 26 0.4 36 87 64 73
3 Alley BS 33 0.9 44 130 77 82
CodePudding user response:
It's usually more efficient if you can avoid iterating by rows.
For this you can simply manipulate the dataframe according to how to structure the json then use zip
or .to_json()
to then construct your final dictionary/json.
import pandas as pd
import json
df = pd.read_excel('D:/test/sample.xlsx')
program_cols = ['Name','Program','Age']
rating_cols = ['Name','Rating']
location_cols = [col for col in df.columns if col not in program_cols rating_cols ]
programs = json.loads(df[program_cols].to_json(orient='index'))
locations = json.loads(df[location_cols].T.to_json(orient='index'))
name_ratings = dict (zip(df['Name'], df['Rating']))
data = {'Program': programs,
'Locations': locations,
'name_ratings': name_ratings}
Output:
print(data)
{'Program': {'0': {'Name': 'John', 'Program': 'BS', 'Age': 29.0}, '1': {'Name': 'Doe', 'Program': 'MS', 'Age': 35.0}, '2': {'Name': 'Jessica', 'Program': 'MS', 'Age': 26.0}, '3': {'Name': 'Alley', 'Program': 'BS', 'Age': 33.0}}, 'Locations': {'New York': {'0': 78.0, '1': 80.0, '2': 36.0, '3': 44.0}, 'Chicago': {'0': 68.0, '1': 53.0, '2': 87.0, '3': 130.0}, ' Houston': {'0': 57.0, '1': 89.0, '2': 64.0, '3': 77.0}, 'Alabama': {'0': 98.0, '1': 124.0, '2': 73.0, '3': 82.0}}, 'name_ratings': {'John': 0.2, 'Doe': 0.7, 'Jessica': 0.4, 'Alley': 0.9}}