I have a dataframe column where all the values are under a list format (one list per column value with one or multiple items).
I want to delete rows where a specific string is found in these list (the column value can be a 5 items list, if one of the item match with a specific string, then the row has to be dropped)
for row in df:
for count, item in enumerate(df["prescript"]):
for element in item:
if "complementary" in element:
df.drop(row)
df["prescript"] is the column on which i want to iterate
"complementary" : if that word is find in column value, the row has to be dropped
How can i improve the code above to make it works?
Thanks all
CodePudding user response:
Just mask first the rows which contain the word using Series.apply
word = "complementary"
word_is_in = df["prescript"].apply(lambda list_item: word in list_item)
Then use boolean indexing to select only the rows which don't contain the word by inverting the boolean Series word_is_in
df = df[~word_is_in]
CodePudding user response:
Impractical solution that may trigger some new learning:
df = pd.DataFrame(
columns=" index drug prescript ".split(),
data= [
[ 0, 1, ['a', 's', 'd', 'f'], ],
[ 1, 2, ['e', 'a', 'e', 'f'], ],
[ 2, 3, ['e', 'a'], ],
[ 3, 4, ['a', 'complementary'], ],]).set_index("index", drop=True)
df.loc[
df['prescript'].explode().replace({'complementary': np.nan}).groupby(level=0).agg(lambda x: ~pd.isnull(x).any())
]