I have a DataFrame with a names columns which is a list of names, and I want to drop all rows if a name is not in the list.
I wrote :
df["John" in df.names]
But this doesn't work at all.
Is there a way to do so ? I couldn't find any answer over the Internet.
Thanks !
CodePudding user response:
Use lambda function with in
:
df1 = df[df.names.map(lambda x: "John" in x)]
#alternative
df1 = df[df.names.apply(lambda x: "John" in x)]
Or list comprehension:
df1 = df[["John" in x for x in df.names]]
Or use set
intersection:
df1 = df[df.names.map(set) & set(['John'])]
Or use DataFrame
constructor:
df1 = df[pd.DataFrame(df.names.tolist(), index=df.index).eq('John').any(axis=1)]
Performance in small 2k rows, best test in real data:
df = pd.DataFrame({'value': [10, 11], 'names': [['John', 'Paul'], ['Louis', 'Alex']]})
df = pd.concat([df] * 1000, ignore_index=True)
print (df)
In [61]: %timeit df[["John" in x for x in df.names]]
863 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [62]: %timeit df[df.names.map(lambda x: "John" in x)]
834 µs ± 3.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [63]: %timeit df[df.names.apply(lambda x: "John" in x)]
910 µs ± 72.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [65]: %timeit df[df.names.map(set) & set(['John'])]
1.66 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [66]: %timeit df[pd.DataFrame(df.names.tolist(), index=df.index).eq('John').any(axis=1)]
1.49 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [67]: %timeit df.loc[df['names'].explode().eq('John').loc[lambda x: x].index]
2.47 ms ± 70.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
CodePudding user response:
You can explode your list of names and keep rows that contain the name:
out = df.loc[df['names'].explode().eq('John').loc[lambda x: x].index]
print(out)
# Output
value names
0 10 [John, Paul]
Setup:
data = {'value': [10, 11],
'names': [['John', 'Paul'],
['Louis', 'Alex']]}
df = pd.DataFrame(data)
print(df)
# Output
value names
0 10 [John, Paul]
1 11 [Louis, Alex]