I want to create a new column based on a condition that have to be applied on a list. Here's a reproducible example:
import pandas as pd
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"BRAND": [[], ["LVH"], ["FER", "MER", "POR"], ["WDC", "AUD", "LVH"], ["AST"]]
}
)
print(df)
ID BRAND
0 1 []
1 2 [LVH]
2 3 [FER, MER, POR]
3 4 [WDC, AUD, LVH]
4 5 [AST]
As one can see, each object in the BRAND
column is a list that can contain one or more elements (the list can also be empty as for the row where ID = 1
).
Now, given the following target list target_list = ["LVH", "WDC"]
, my goal is to create a new column based on the following rule: if at least one element of target_list
(i.e. either LVH
or WDC
) is present in the BRAND
column value, then assign a flag equal to Y
in a new column (otherwise assign N
). The resulting DataFrame for the above example should look as follows:
ID BRAND FLAG
0 1 [] N
1 2 [LVH] Y
2 3 [FER, MER, POR] N
3 4 [WDC, AUD, LVH] Y
4 5 [AST] N
CodePudding user response:
Option 1
Seems to be a bit faster on a larger set than Option 2 below:
df['FLAG'] = df.BRAND.explode().isin(target_list).groupby(level=0, sort=False)\
.any().map({True:'Y',False:'N'})
print(df)
ID BRAND FLAG
0 1 [] N
1 2 [LVH] Y
2 3 [FER, MER, POR] N
3 4 [WDC, AUD, LVH] Y
4 5 [AST] N
Explanation:
- Use
Series.explode
to "[t]ransform each element of a list-like to a row". - Check for matches with
Series.isin
, and getTrue
orFalse
. - We now have a series with duplicate rows, so use
Series.groupby
to isolate the groups, applyany
, and get apd.Series
back with booleans in the correct shape. - Finally, use
Series.map
to turnFalse
andTrue
into"N"
and"Y"
respectively.
Option 2:
Basically same performance as the answer by @AnoushiravanR
df['FLAG'] = df.BRAND.apply(lambda x: 'Y' if len(set(x) & set(target_list))
else 'N')
print(df)
ID BRAND FLAG
0 1 [] N
1 2 [LVH] Y
2 3 [FER, MER, POR] N
3 4 [WDC, AUD, LVH] Y
4 5 [AST] N
Explanation: set(list_a) & set(list_b)
being a shorthand for set_a.intersection(set_b)
, which we pass to len()
. If len(...) == 0
, this will result in False
.
CodePudding user response:
Another very similar approach could be:
df['Flag'] = df.BRAND.transform(lambda x: 'Y' if any(mot in x for mot in ['LVH', 'WDC']) else 'N')
ID BRAND Flag
0 1 [] N
1 2 [LVH] Y
2 3 [FER, MER, POR] N
3 4 [WDC, AUD, LVH] Y
4 5 [AST] N