Group by value from list of dictionary which has list of dictionary as value-CodePudding

I have one list of dictionary as given below.

d = [
{'value': [{'student_id': 45522, 'marks': 75}, {'student_id': 70515, 'marks': 80}],'year':2001},
{'value': [{'student_id': 45522, 'marks': 35}, {'student_id': 70515, 'marks': 90}],'year':2002},
{'value': [{'student_id': 45522, 'marks': 60}, {'student_id': 70515, 'marks': 89}],'year':2003}
]

I want result as below.

student_marks_data = [
                      {"student_id":45522,years:[2001,2002,2003],marks:[75,30,60]},
                      {"student_id":70515,years:[2001,2002,2003],marks:[80,90,89]}
                     ]

I have read about itertools but don't know how to use it to solve this.

CodePudding user response：

I'd split this in two functions. We have to group the data and then it has to be restructured in your desired format. The code for this is below.

Another solution, and one that I personally prefer, would be to represent the data with data classes and properly parse the JSON file with something like pyserde.

# Easy solution without any external libaries

def group_students(data: list[dict]) -> dict[int, dict[str, list[int]]]:
    students: dict[int, dict[str, list[int]]] = {}
    for entry in data:
        year = entry["year"]
        for student in entry["value"]:
            student_id = student["student_id"]
            marks = student["marks"]

            if student_id not in students:
                students[student_id] = {"years": [year], "marks": [marks]}
            else:
                students[student_id]["years"].append(year)
                students[student_id]["marks"].append(marks)

    return students


def restructure_data(data: dict[int, dict[str, list[int]]]) -> list[dict]:
    students: list = []

    for student_id, student_data in data.items():
        students.append({
            "student_id": student_id,
            "years": student_data["years"],
            "marks": student_data["marks"]
        })

    return students


if __name__ == "__main__":
    d = [
        {'value': [{'student_id': 45522, 'marks': 75}, {'student_id': 70515, 'marks': 80}],'year':2001},
        {'value': [{'student_id': 45522, 'marks': 35}, {'student_id': 70515, 'marks': 90}],'year':2002},
        {'value': [{'student_id': 45522, 'marks': 60}, {'student_id': 70515, 'marks': 89}],'year':2003}
    ]

    grouped = group_students(d)
    student_marks_data = restructure_data(grouped)
    print(student_marks_data)

CodePudding user response：

Try:

tmp = {}
for dct in d:
    for student in dct["value"]:
        tmp.setdefault(student["student_id"], []).append((dct["year"], student["marks"]))

student_marks_data = []
for k, v in tmp.items():
    student_marks_data.append(
        {
            "student_id": k,
            "years": [y for y, _ in v],
            "marks": [m for _, m in v],
        }
    )

print(student_marks_data)

Prints:

[
    {"student_id": 45522, "years": [2001, 2002, 2003], "marks": [75, 35, 60]},
    {"student_id": 70515, "years": [2001, 2002, 2003], "marks": [80, 90, 89]},
]