Home > Back-end >  Merge two CSV files and return output as dictionary Python
Merge two CSV files and return output as dictionary Python

Time:01-15

Please I need help merging two CSV files into one dictionary.

Dataset student name, matricNo, dob,address,department

  1. Margaret Chase, 201001,1976-12-02,USCGC Walker FPO AP 28772, Micro Biology
  2. Erica Woods,201002,2010-02-03,"565 Patricia Coves Apt. 659 Lake Brandiside, MS 61181",French

Records

title,unit,code,score,term,session,student_id

  1. English,2,ENG101,60,Fall,2020,201001
  2. Chemistry I,4,CHE101,70,Fall,2021,201001
  3. Chemistry I,4,CHE101,61,Fall,2021,201002
  4. Maths,3,MTH101,52,Spring,2020,201002
  5. English,2,ENG101,80,Fall,2020,201003
  6. Chemistry I,4,CHE101,56,Fall,2021,201003
  7. Geology I,4,Geo101,56,Fall,2021,201003

enter image description here

enter image description here

I have tried reading the two csv into separate data frames enter image description here

Trying to merge the two df looks ok but converting the df into dictionary is not giving desired result.

enter image description here

What am getting when I try to convert the df to dictionary enter image description here

Expected output

student_records = {
      "name": "James Webb",
      "id": "201003",
      "courses":[
         {"title": "English", "unit": 2, "code": "ENG101", "score": 60, "term": "Fall", "session":'2020'},
         {"title": "Chemistry I", "unit": 4, "code": "CHE101", "score": 70, "term": "Fall", "session":'2021'},
         {"title": "Maths", "unit": 3, "code": "MTH101", "score": 80, "term": "Spring", "session":'2020'},
         {"title": "Chemistry II", "unit": 4, "code": "CHE102", "score": 91, "term": "Spring", "session":'2021'}, 
         {"title": "History", "unit": 2, "code": "HIS102", "score": 40, "term": "Spring", "session":'2020'}
      ]
   }

CodePudding user response:

You can probably skip pandas and just use the csv module. Conceptually we will read in students into a dictionary based on the key of student id to make lookups easier, Then we will read in classes and find the corresponding student to append the class to. Finally, we can take the values of our lookup as a list:

import csv

with open("students.csv", "r", encoding="utf-8") as students_in:
    merged = {
        student["matricNo"] : student 
        for student
        in csv.DictReader(students_in, skipinitialspace=True)
    }

with open("classes.csv", "r", encoding="utf-8") as classes_in:
    for student_class in csv.DictReader(classes_in):
        student_id = student_class["student_id"]
        if student_id not in merged:
            print(f"student not found: {student_id}")
            continue
        merged[student_id].setdefault("classes", []).append(student_class)

merged = list(merged.values())

print(merged)

Giving us:

[
    {
        "name": "Margaret Chase",
        "matricNo": "201001",
        "dob": "1976-12-02",
        "address": "USCGC Walker FPO AP 28772",
        "department": "Micro Biology",
        "classes": [
            {
                "title": "English",
                "unit": "2",
                "code": "ENG101",
                "score": "60",
                "term": "Fall",
                "session": "2020",
                "student_id": "201001"
            },
            {
                "title": "Chemistry I",
                "unit": "4",
                "code": "CHE101",
                "score": "70",
                "term": "Fall",
                "session": "2021",
                "student_id": "201001"
            }
        ]
    },
    {
        "name": "Erica Woods",
        "matricNo": "201002",
        "dob": "2010-02-03",
        "address": "565 Patricia Coves Apt. 659 Lake Brandiside, MS 61181",
        "department": "French",
        "classes": [
            {
                "title": "Chemistry I",
                "unit": "4",
                "code": "CHE101",
                "score": "61",
                "term": "Fall",
                "session": "2021",
                "student_id": "201002"
            },
            {
                "title": "Maths",
                "unit": "3",
                "code": "MTH101",
                "score": "52",
                "term": "Spring",
                "session": "2020",
                "student_id": "201002"
            }
        ]
    }
]
  • Related