I have a df with a col of a list of strings. I want to create col_I_want
of exact matches.
lookfor=["apple", "nectarine"]
col col_I_want
0 ["apple", "banana", "nectarine"] ["apple", "nectarine"]
1 ["pear", "banana"] np.NaN
If I do the below, I get numpy array object has no attribute apply
error.
df['col'].apply(lambda x: list(set(x).intersection(lookfor)))
If I do the below, I get pd.Series.__iter__() is not implemented
since I'm using Pandas on Spark
ps.series(df['col']).apply(lambda x: list(set(x).intersection(lookfor)))
I could convert the column to string but that would make it harder to find exact matches.
CodePudding user response:
One can do that with pandas.Series.apply
and a custom lambda function as follows
import pandas as pd
import numpy as np
df['col_I_want'] = df['col'].apply(lambda x: [i for i in x if i.lower() in [j.lower() for j in lookfor]] if len([i for i in x if i.lower() in [j.lower() for j in lookfor]]) > 0 else np.NaN)
[Out]:
col col_I_want
0 [apple, banana, nectarine] [apple, nectarine]
1 [pear, banana] NaN
One can also do it with a list comprehension as follows
df['col_I_want'] = [ [i for i in x if i.lower() in [j.lower() for j in lookfor]] if len([i for i in x if i.lower() in [j.lower() for j in lookfor]]) > 0 else np.NaN for x in df['col'] ]
[Out]:
col col_I_want
0 [apple, banana, nectarine] [apple, nectarine]
1 [pear, banana] NaN
Notes:
.lower()
is a way to make it case insensitive.The second if is a way to then fill the empty column with a
numpy.NaN
.
CodePudding user response:
I change solution for custom function for NaN
if no match:
df = pd.DataFrame({'col': [['apple', 'banana', 'nectarine'], ['pear', 'banana']]})
lookfor=["apple", "nectarine"]
def f(x):
y = list(set(x).intersection(lookfor))
return np.nan if len(y) == 0 else y
df['col_I_want'] = df['col'].apply(f)
print (df)
col col_I_want
0 [apple, banana, nectarine] [nectarine, apple]
1 [pear, banana] NaN