Home > Enterprise >  Python, most compact&efficeint way of checking if an item is any of the lists (which are inside dict
Python, most compact&efficeint way of checking if an item is any of the lists (which are inside dict

Time:11-30

I have a dictionary with lists (with strings inside) and I need I need to check if a string appears anywhere among those lists. Here is an example

classes = {
  "class_A" : ["Mike","Alice","Peter"],
  "class_B" : ["Sam","Robert","Anna"],
  "class_C" : ["Tom","Nick","Jack"]
}

students=["Alice","Frodo","Jack"]
for student in students:
  if student *in any of the classes*:
    print("Is here")
  else:
    print("Is not here")

For every student in the list I provided: if that student is in any of the classes, do A, if not do B. Currently the output is Is here, Is not here, Is here

Here is my current code:

studentsInClasses=[]
for studentsInClass in classes.values()
  studentsInClasses =studentsInClass

students=["Alice","Frodo","Jack"]
for student in students:
  if student in studentsInClasses:
    print("Is here")
  else:
    print("Is not here")

But this is happening inside a complex structure of classes, functions and loops so it become a major performance issue as soon as I scale up the inputs. Here is something that I do like, but is a bit annoying as I have to make sure that whatever function my code is in, has access to this one:

def check(student,classes):
  for value in classes.values():
    if student in value:
      return True
  return False

It is probably as good as it gets, but I would like to know if there is a simple one liner that does the job.

Requirements:

  • Does not create a copy of the lists
  • Does not rely on keys in any way
  • Preferably nice and simple
  • Isn't an over-engineered superefficient solution

I am new to stack overflow, if I am doing any major mistakes in my way of posting please do tell, I will try to improve my question writing.

Thanks

CodePudding user response:

If a generator expression is acceptable with regards to your requirements, then:

def check(student, classes):
  return any(student in value for value in classes.values())

And to get a boolean for each student, you could create this function:

def checkall(students, classes):
  return [any(student in value for value in classes.values()) 
          for student in students]

For your sample data, this would return [True, False, True].

CodePudding user response:

So if you want to just print is here or is not here here is an example:

classes = {
  "class_A" : ["Mike","Alice","Peter"],
  "class_B" : ["Sam","Robert","Anna"],
  "class_C" : ["Tom","Nick","Jack"]
}
for line in str(classes).split(","):
    if student in line:
        print("Student in here")
    else:
        print("Student not here")

CodePudding user response:

Since this is in a loop, you should create a set for all of the values:

from itertools import chain

values = set(chain(*classes.values()))

students=["Alice","Frodo","Jack"]

for student in students:
  if student in values:
    print("Is here")
  else:
    print("Is not here")

The reason is that a set lookup is a constant time lookup, and in a tight loop makes a big difference

CodePudding user response:

How about making the list of students in each class a set? Then the lookup time will be o(1) and you can loop over n classes. The you can have:

class_to_students = {
  "class_A" : {"Mike","Alice","Peter"},
  "class_B" : {"Sam","Robert","Anna"},
  "class_C" : {"Tom","Nick","Jack"}
}
students=["Alice","Frodo","Jack"]
for student in students:
  for class_students in class_to_students.values():
    if student in class_students:
      print(f"{student} Is here")
      break
  else:
    # loop was not broken out of
    print(f"{student} Is not here")

-->

Alice Is here
Frodo Is not here
Jack Is here

If you exclude a solution like this, then you are stuck with your n*m solution where n is the number of classes and m is the number of students in each class.

CodePudding user response:

Is this something you are looking for?

classes = {
  "class_A": ["Mike","Alice","Peter"],
  "class_B": ["Sam","Robert","Anna"],
  "class_C": ["Tom","Nick","Jack"]
}

students = ["Alice", "Frodo", "Jack"]
res = [student in class_ for class_ in classes.values() for student in students ]
print(res)

CodePudding user response:

  • If allowed, I guess you'd increase performance if you delete the entry within the class, whenever you had a match.

  • Use sorting beforehand

studentsInClasses=[]
for studentsInClass in classes.values()
  studentsInClasses =studentsInClass

studentsInClasses = sorted(studentsInClasses)
students=sorted(["Alice","Frodo","Jack"])

lastMatch=0
for i in range(len(students)):
  student = students[i]
  try:
    lastMatch = studentsInClasses[lastMatch:].index(student)
    print(f"{student} in class")
  except ValueError as e:
    pass

  • Related