python dynamically create dictionary-CodePudding

Given a list of string in python

logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95", "0001 7 80", "0001 8 80",
        "0001 10 90", "0002 10 90", "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"] where 
s[0] = student ID, 
s[1] = problem ID,
s[2] = score for the problem

I want to find whether the number of problems solved for each student is the same. Ex. student 0001 solved 6 problems and student 0002 solved 5 but student 0001 attempted problem #5 twice. So both student 0001 and student 0002 solved 2 problems. I also need to check if each student solved the same problem # and received the same score for the attempted problem. How do I write this is pythonic code?

CodePudding user response：

To do that, you would iterate over the list of strings, and split that string by whitespaces:

logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95", "0001 7 80", "0001 8 80",
        "0001 10 90", "0002 10 90", "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"]
for log in logs:
    s = log.split(' ')

CodePudding user response：

You will need several different groupings (dictionaries) to analyze the data on all these different axes:

First collate the information into the various grouping axes:

logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95",
        "0001 7 80", "0001 8 80", "0001 10 90", "0002 10 90",
        "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"]

students = dict() # {studentID: {problemID: max Score}} nested dictionaries
problems = dict() # {problemID: {studentIDs}} dictionary of sets
results  = dict() # {(problemID,result): {studentIDs}} matching results
for s,p,r in map(str.split,logs):
    scores = students.setdefault(s,dict()) # track problems per student
    scores[p] = max(scores.get(p,r),r)     # max score for student/problem
    problems.setdefault(p,set()).add(s)    # add student to problem's set
    results.setdefault((p,r),set()).add(s) # add student to problem/result

Then you can query these data structures to obtain the insight you are looking for.

Raw groupings:

# problems solved by each student with their maximum result
print(students)
{'0001': {'3': '95', '5': '90', '7': '80', '8': '80', '10': '90'},
 '0002': {'3': '95', '10': '90', '7': '80', '8': '80', '5': '100'},
 '0003': {'99': '90'}}

# list of students that solved each problem
print(problems)
{'3': {'0002', '0001'},
 '5': {'0002', '0001'},
 '7': {'0002', '0001'},
 '8': {'0002', '0001'},
 '10': {'0002', '0001'},
 '99': {'0003'}}

# list of students that got a specific result on each problem
print(results)
{('3', '95'): {'0002', '0001'}, ('5', '90'): {'0001'},
 ('5', '100'): {'0002', '0001'}, ('7', '80'): {'0002', '0001'},
 ('8', '80'): {'0002', '0001'}, ('10', '90'): {'0002', '0001'},
 ('99', '90'): {'0003'}}

Derived information by aggregation / filtering:

# number of problems solved per student
print({s:len(pr) for s,pr in students.items()}) 
{'0001': 5, '0002': 5, '0003': 1}
    
# students that got the same score on the same problem (plagiarism?)
for (prob,result),students in results.items():
    if len(students)>1:
        print(f"# same result ({result}) on problem #{prob} :",students)

# same result (95) on problem #3 : {'0001', '0002'}
# same result (100) on problem #5 : {'0001', '0002'}
# same result (80) on problem #7 : {'0001', '0002'}
# same result (80) on problem #8 : {'0001', '0002'}
# same result (90) on problem #10 : {'0001', '0002'}

Note that a relational database is usually a better tool to perform this type of analysis.