How to remove a list of values from a 2D list in python efficiently?-CodePudding

Let's say that we have this array:

people = [[Amy, 25], [Bella, 30], [Charlie, 29], [Dean, 21], [Elliot, 19]]

And I have a list of names that I want to remove from it:

people_rem = [Amy, Charlie, Dean]

So that our final array will look like this:

final_people = [[Bella, 30], [Elliot, 19]]

I have tried doing this using list comprehension, which works, but it's incredibly slow (not in this specific case, but in my real life usage i have a lot of lists with a lot more items):

final_people = [person for person in people if people[0] not in people_rem]

How would I do this in a way that's efficient and fast?

CodePudding user response：

You are using a data structure that supports only linear lookup. You can use the bisect module to do logarithmic-time lookup (deletion will still be linear time), but why bother when there is a structure that lets you do constant-time lookup and deletion?

Use a dictionary:

people = dict(people)

Now removal is trivial:

for name in people_rem:
    del people[name]

Notice that this runs in O(len(people_rem)) time, not O(len(people)). Since presumably len(people_rem) < len(people_rem), this is a good thing (TM). I'm not counting the O(len(people)) conversion to a dictionary, since you can likely do that directly when you create people in the first place, making it no more expensive than building the initial list.

CodePudding user response：

Have you tried doing it through pandas? Check if this is faster.

import pandas as pd

people = [['Amy', 25], ['Bella', 30], ['Charlie', 29], ['Dean', 21], ['Elliot', 19]]

people_rem = ['Amy', 'Charlie', 'Dean']

def remove(people, people_rem):
    df = pd.DataFrame(people, columns = ['Name', 'Age'])
    for person in people_rem:
        df.drop(df[df.Name == person].index, inplace=True)
    return df.values.tolist()

final_people = remove(people, people_rem)
print(final_people)