Python: Convert excel table to a specific JSON format-CodePudding

I have an excel table with a format:

I want to convert this table to a JSON format which looks like this:

data = {'Program': {1: {'Name': 'John', 'Program': 'BS', 'Age': 29}, 2: {'Name': 'Doe', 'Program': 'MS', 'Age': 35}},  'Locations': {'New York': {1: 78, 2: 80, 3: 36, 4: 44}, 'Chicago': {1: 68, 2: 53, 3: 87, 4: 130}, 'Houston': {1: 57, 2: 89, 3: 64, 4: 77}, 'Alabama': {1: 98, 2: 124, 3: 73, 4: 82}}, 'name_ratings': {'John': 0.2, 'Doe': 0.7, 'Jessica': 0.4, 'Alley': 0.9}}

I am using openpyxl to load the excel file in Python and iterating over rows.

for col in sheet.iter_rows(min_row=1, min_col=1, max_row=5, max_col=8):
    for cell in col:
        print(cell.value)

Can anyone please help me with this?

File attached: sample excel file

Thanks.

CodePudding user response：

First, sheet.iter_rows gives you rows, not columns. If you don't care about the column names as given in the sheet, you may want pass in min_row=2.

for row in sheet.iter_rows(min_row=2, min_col=1, max_row=5, max_col=8):
    print([cell.value for cell in row])

['John', 'BS', 29, 0.2, 78, 68, 57, 98]
['Doe', 'MS', 35, 0.7, 80, 53, 89, 124]
['Jessica', 'MS', 26, 0.4, 36, 87, 64, 73]
['Alley', 'BS', 33, 0.9, 44, 130, 77, 82]

Then you can do your aggregations

name_ratings = {}
programs = {}  # fill all this in
...
for row in sheet.iter_rows(min_row=2, min_col=1, max_row=5, max_col=8):
    name_ratings[row[0].value] = row[3].value
    ...
{'John': 0.2, 'Doe': 0.7, 'Jessica': 0.4, 'Alley': 0.9}
...
result = json.dumps({"name_ratings": name_ratings, "programs": programs})  # and the other values

For easier access, you could also use a different library like pandas to load the data

import pandas
pandas.read_excel(path)

      Name Program  Age  Rating  New York  Chicago  Houston  Alabama
0     John      BS   29     0.2        78       68       57       98
1      Doe      MS   35     0.7        80       53       89      124
2  Jessica      MS   26     0.4        36       87       64       73
3    Alley      BS   33     0.9        44      130       77       82

CodePudding user response：

It's usually more efficient if you can avoid iterating by rows.

For this you can simply manipulate the dataframe according to how to structure the json then use zip or .to_json() to then construct your final dictionary/json.

import pandas as pd
import json

df = pd.read_excel('D:/test/sample.xlsx')

program_cols = ['Name','Program','Age']
rating_cols = ['Name','Rating']
location_cols = [col for col in df.columns if col not in program_cols   rating_cols ]


programs = json.loads(df[program_cols].to_json(orient='index'))
locations = json.loads(df[location_cols].T.to_json(orient='index'))
name_ratings = dict (zip(df['Name'], df['Rating']))

data = {'Program': programs,
        'Locations': locations,
        'name_ratings': name_ratings}

Output:

print(data)
{'Program': {'0': {'Name': 'John', 'Program': 'BS', 'Age': 29.0}, '1': {'Name': 'Doe', 'Program': 'MS', 'Age': 35.0}, '2': {'Name': 'Jessica', 'Program': 'MS', 'Age': 26.0}, '3': {'Name': 'Alley', 'Program': 'BS', 'Age': 33.0}}, 'Locations': {'New York': {'0': 78.0, '1': 80.0, '2': 36.0, '3': 44.0}, 'Chicago': {'0': 68.0, '1': 53.0, '2': 87.0, '3': 130.0}, ' Houston': {'0': 57.0, '1': 89.0, '2': 64.0, '3': 77.0}, 'Alabama': {'0': 98.0, '1': 124.0, '2': 73.0, '3': 82.0}}, 'name_ratings': {'John': 0.2, 'Doe': 0.7, 'Jessica': 0.4, 'Alley': 0.9}}