Given a list of string in python
logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95", "0001 7 80", "0001 8 80",
"0001 10 90", "0002 10 90", "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"] where
s[0] = student ID,
s[1] = problem ID,
s[2] = score for the problem
I want to find whether the number of problems solved for each student is the same. Ex. student 0001 solved 6 problems and student 0002 solved 5 but student 0001 attempted problem #5 twice. So both student 0001 and student 0002 solved 2 problems. I also need to check if each student solved the same problem # and received the same score for the attempted problem. How do I write this is pythonic code?
CodePudding user response:
To do that, you would iterate over the list of strings, and split that string by whitespaces:
logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95", "0001 7 80", "0001 8 80",
"0001 10 90", "0002 10 90", "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"]
for log in logs:
s = log.split(' ')
CodePudding user response:
You will need several different groupings (dictionaries) to analyze the data on all these different axes:
First collate the information into the various grouping axes:
logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95",
"0001 7 80", "0001 8 80", "0001 10 90", "0002 10 90",
"0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"]
students = dict() # {studentID: {problemID: max Score}} nested dictionaries
problems = dict() # {problemID: {studentIDs}} dictionary of sets
results = dict() # {(problemID,result): {studentIDs}} matching results
for s,p,r in map(str.split,logs):
scores = students.setdefault(s,dict()) # track problems per student
scores[p] = max(scores.get(p,r),r) # max score for student/problem
problems.setdefault(p,set()).add(s) # add student to problem's set
results.setdefault((p,r),set()).add(s) # add student to problem/result
Then you can query these data structures to obtain the insight you are looking for.
Raw groupings:
# problems solved by each student with their maximum result
print(students)
{'0001': {'3': '95', '5': '90', '7': '80', '8': '80', '10': '90'},
'0002': {'3': '95', '10': '90', '7': '80', '8': '80', '5': '100'},
'0003': {'99': '90'}}
# list of students that solved each problem
print(problems)
{'3': {'0002', '0001'},
'5': {'0002', '0001'},
'7': {'0002', '0001'},
'8': {'0002', '0001'},
'10': {'0002', '0001'},
'99': {'0003'}}
# list of students that got a specific result on each problem
print(results)
{('3', '95'): {'0002', '0001'}, ('5', '90'): {'0001'},
('5', '100'): {'0002', '0001'}, ('7', '80'): {'0002', '0001'},
('8', '80'): {'0002', '0001'}, ('10', '90'): {'0002', '0001'},
('99', '90'): {'0003'}}
Derived information by aggregation / filtering:
# number of problems solved per student
print({s:len(pr) for s,pr in students.items()})
{'0001': 5, '0002': 5, '0003': 1}
# students that got the same score on the same problem (plagiarism?)
for (prob,result),students in results.items():
if len(students)>1:
print(f"# same result ({result}) on problem #{prob} :",students)
# same result (95) on problem #3 : {'0001', '0002'}
# same result (100) on problem #5 : {'0001', '0002'}
# same result (80) on problem #7 : {'0001', '0002'}
# same result (80) on problem #8 : {'0001', '0002'}
# same result (90) on problem #10 : {'0001', '0002'}
Note that a relational database is usually a better tool to perform this type of analysis.