the list I have -
[
"Mathematics-2 (21SMT-125)",
"Mid-Semester Test-1",
"40",
"23.5",
"Mid-Semester Test-2",
"40",
"34",
"Disruptive Technologies - 2 (21ECH-103)",
"Experiment-1",
"20",
"19",
"Experiment-2",
"20",
"17",
"Experiment-3",
"20",
"18.5",
]
This list of stings is parsed from html using bs4
format to convert in :
{
"Subject": {
"Mathematics-2 (21SMT-125)": {
"Mid-Semester Test-1": [40,23.5],
"Mid-Semester Test-2": [40,34]
},
"Disruptive Technologies - 2 (21ECH-103)": {
"Experiment-1": [20,19],
"Experiment-2": [20,17],
"Experiment-3": [20,18.5]
}
}
}
CodePudding user response:
The problem is that the list you provided is a flat list of items with no indicator of their hierarchical position in the desired structure.
One approach you could consider is if the entries that represent a parent object (Mathematics, etc...) are the only entries that contain parentheses, you could iterate on your list and use either string matching or regex to identify the parent, create a top level object for it then you'd need to add the next two entries as the value of the key/value pair as a list.
This assumes that you'll always have two subsequent values at the child level. If the number of attributes isn't fixed but they're always numeric you could use regex to determine if it's numeric or non-numeric and keep adding items to the value list until you hit another non-numeric entry, which would be treated as the next sibling in the hierarchy.
CodePudding user response:
I would review the approach and check whether information from bs4 can be parsed in some smarter way - try to do more scrapping steps, first to reach subject, second "Semester/Experiment" third - grades.
If it's not possible and data returned from bs4 cannot be changed.. Only thing you can do is to try determine whether string is name of subject, semester or grade/score and try to use some while loops. Name of subject seems to have special code in the end, which can be distinguished from name of the semester/experiment using regexp and grade/scrore can be always parsed to number..
CodePudding user response:
For data exactly like yours (where a string with a (
denotes a top-level entry, and there are always two numbers per entry), you could come up with a state machine sort of thing like this -- but like I commented, you really should improve your parsing code instead, since the HTML you're scraping your data off is likely already structured.
def is_float(s):
try:
float(s)
return True
except ValueError:
return False
def parse_inp(inp):
flat_map = {}
stack = []
x = 0
while x < len(inp):
if "(" in inp[x]:
stack.clear()
if is_float(inp[x]) and is_float(inp[x 1]):
flat_map[tuple(stack)] = (float(inp[x]), float(inp[x 1]))
x = 2
stack.pop(-1)
continue
stack.append(inp[x])
x = 1
return flat_map
def nest_flat_map(flat_map):
root = {}
for key_path, values_list in flat_map.items():
dst = root
for key in key_path[:-1]:
dst = dst.setdefault(key, {})
dst[key_path[-1]] = values_list
return root
inp = [
# ... data from original post
]
nested_map = nest_flat_map(parse_inp(inp))
print(nested_map)
This outputs the expected
{
"Mathematics-2 (21SMT-125)": {
"Mid-Semester Test-1": (40.0, 23.5),
"Mid-Semester Test-2": (40.0, 34.0),
},
"Disruptive Technologies - 2 (21ECH-103)": {
"Experiment-1": (20.0, 19.0),
"Experiment-2": (20.0, 17.0),
"Experiment-3": (20.0, 18.5),
},
}