Please I need help merging two CSV files into one dictionary.
Dataset student name, matricNo, dob,address,department
- Margaret Chase, 201001,1976-12-02,USCGC Walker FPO AP 28772, Micro Biology
- Erica Woods,201002,2010-02-03,"565 Patricia Coves Apt. 659 Lake Brandiside, MS 61181",French
Records
title,unit,code,score,term,session,student_id
- English,2,ENG101,60,Fall,2020,201001
- Chemistry I,4,CHE101,70,Fall,2021,201001
- Chemistry I,4,CHE101,61,Fall,2021,201002
- Maths,3,MTH101,52,Spring,2020,201002
- English,2,ENG101,80,Fall,2020,201003
- Chemistry I,4,CHE101,56,Fall,2021,201003
- Geology I,4,Geo101,56,Fall,2021,201003
I have tried reading the two csv into separate data frames
Trying to merge the two df looks ok but converting the df into dictionary is not giving desired result.
What am getting when I try to convert the df to dictionary
Expected output
student_records = {
"name": "James Webb",
"id": "201003",
"courses":[
{"title": "English", "unit": 2, "code": "ENG101", "score": 60, "term": "Fall", "session":'2020'},
{"title": "Chemistry I", "unit": 4, "code": "CHE101", "score": 70, "term": "Fall", "session":'2021'},
{"title": "Maths", "unit": 3, "code": "MTH101", "score": 80, "term": "Spring", "session":'2020'},
{"title": "Chemistry II", "unit": 4, "code": "CHE102", "score": 91, "term": "Spring", "session":'2021'},
{"title": "History", "unit": 2, "code": "HIS102", "score": 40, "term": "Spring", "session":'2020'}
]
}
CodePudding user response:
You can probably skip pandas and just use the csv module. Conceptually we will read in students into a dictionary based on the key of student id to make lookups easier, Then we will read in classes and find the corresponding student to append the class to. Finally, we can take the values of our lookup as a list:
import csv
with open("students.csv", "r", encoding="utf-8") as students_in:
merged = {
student["matricNo"] : student
for student
in csv.DictReader(students_in, skipinitialspace=True)
}
with open("classes.csv", "r", encoding="utf-8") as classes_in:
for student_class in csv.DictReader(classes_in):
student_id = student_class["student_id"]
if student_id not in merged:
print(f"student not found: {student_id}")
continue
merged[student_id].setdefault("classes", []).append(student_class)
merged = list(merged.values())
print(merged)
Giving us:
[
{
"name": "Margaret Chase",
"matricNo": "201001",
"dob": "1976-12-02",
"address": "USCGC Walker FPO AP 28772",
"department": "Micro Biology",
"classes": [
{
"title": "English",
"unit": "2",
"code": "ENG101",
"score": "60",
"term": "Fall",
"session": "2020",
"student_id": "201001"
},
{
"title": "Chemistry I",
"unit": "4",
"code": "CHE101",
"score": "70",
"term": "Fall",
"session": "2021",
"student_id": "201001"
}
]
},
{
"name": "Erica Woods",
"matricNo": "201002",
"dob": "2010-02-03",
"address": "565 Patricia Coves Apt. 659 Lake Brandiside, MS 61181",
"department": "French",
"classes": [
{
"title": "Chemistry I",
"unit": "4",
"code": "CHE101",
"score": "61",
"term": "Fall",
"session": "2021",
"student_id": "201002"
},
{
"title": "Maths",
"unit": "3",
"code": "MTH101",
"score": "52",
"term": "Spring",
"session": "2020",
"student_id": "201002"
}
]
}
]