Home > Net >  Python: Convert excel table to a specific JSON format
Python: Convert excel table to a specific JSON format

Time:03-31

I have an excel table with a format:

excel data format

I want to convert this table to a JSON format which looks like this:

data = {'Program': {1: {'Name': 'John', 'Program': 'BS', 'Age': 29}, 2: {'Name': 'Doe', 'Program': 'MS', 'Age': 35}},  'Locations': {'New York': {1: 78, 2: 80, 3: 36, 4: 44}, 'Chicago': {1: 68, 2: 53, 3: 87, 4: 130}, 'Houston': {1: 57, 2: 89, 3: 64, 4: 77}, 'Alabama': {1: 98, 2: 124, 3: 73, 4: 82}}, 'name_ratings': {'John': 0.2, 'Doe': 0.7, 'Jessica': 0.4, 'Alley': 0.9}}

I am using openpyxl to load the excel file in Python and iterating over rows.

for col in sheet.iter_rows(min_row=1, min_col=1, max_row=5, max_col=8):
    for cell in col:
        print(cell.value)

Can anyone please help me with this?

File attached: sample excel file

Thanks.

CodePudding user response:

First, sheet.iter_rows gives you rows, not columns. If you don't care about the column names as given in the sheet, you may want pass in min_row=2.

for row in sheet.iter_rows(min_row=2, min_col=1, max_row=5, max_col=8):
    print([cell.value for cell in row])

['John', 'BS', 29, 0.2, 78, 68, 57, 98]
['Doe', 'MS', 35, 0.7, 80, 53, 89, 124]
['Jessica', 'MS', 26, 0.4, 36, 87, 64, 73]
['Alley', 'BS', 33, 0.9, 44, 130, 77, 82]

Then you can do your aggregations

name_ratings = {}
programs = {}  # fill all this in
...
for row in sheet.iter_rows(min_row=2, min_col=1, max_row=5, max_col=8):
    name_ratings[row[0].value] = row[3].value
    ...
{'John': 0.2, 'Doe': 0.7, 'Jessica': 0.4, 'Alley': 0.9}
...
result = json.dumps({"name_ratings": name_ratings, "programs": programs})  # and the other values

For easier access, you could also use a different library like pandas to load the data

import pandas
pandas.read_excel(path)

      Name Program  Age  Rating  New York  Chicago  Houston  Alabama
0     John      BS   29     0.2        78       68       57       98
1      Doe      MS   35     0.7        80       53       89      124
2  Jessica      MS   26     0.4        36       87       64       73
3    Alley      BS   33     0.9        44      130       77       82

CodePudding user response:

It's usually more efficient if you can avoid iterating by rows.

For this you can simply manipulate the dataframe according to how to structure the json then use zip or .to_json() to then construct your final dictionary/json.

import pandas as pd
import json

df = pd.read_excel('D:/test/sample.xlsx')

program_cols = ['Name','Program','Age']
rating_cols = ['Name','Rating']
location_cols = [col for col in df.columns if col not in program_cols   rating_cols ]


programs = json.loads(df[program_cols].to_json(orient='index'))
locations = json.loads(df[location_cols].T.to_json(orient='index'))
name_ratings = dict (zip(df['Name'], df['Rating']))

data = {'Program': programs,
        'Locations': locations,
        'name_ratings': name_ratings}

Output:

print(data)
{'Program': {'0': {'Name': 'John', 'Program': 'BS', 'Age': 29.0}, '1': {'Name': 'Doe', 'Program': 'MS', 'Age': 35.0}, '2': {'Name': 'Jessica', 'Program': 'MS', 'Age': 26.0}, '3': {'Name': 'Alley', 'Program': 'BS', 'Age': 33.0}}, 'Locations': {'New York': {'0': 78.0, '1': 80.0, '2': 36.0, '3': 44.0}, 'Chicago': {'0': 68.0, '1': 53.0, '2': 87.0, '3': 130.0}, ' Houston': {'0': 57.0, '1': 89.0, '2': 64.0, '3': 77.0}, 'Alabama': {'0': 98.0, '1': 124.0, '2': 73.0, '3': 82.0}}, 'name_ratings': {'John': 0.2, 'Doe': 0.7, 'Jessica': 0.4, 'Alley': 0.9}}
  • Related