I have a DataFrame with a column called 'color'
, containing a list of colors.
color |
---|
Red |
Yellow |
Green |
Yellow |
Violet |
I've created two lists, primary
and secondary
. I'm trying to iterate through the values in the 'color' column against the two lists and create a new column 'category'
which contains the list name (or the category "primary
and "secondary"
).
primary = ["red","yellow","blue"]
secondary = ["green","violet","orange"]
This is the output I'm looking for.
color | category |
---|---|
Red | primary |
Yellow | primary |
Green | secondary |
Yellow | primary |
Violet | secondary |
I've tried using two np.where
statements but the second statement overwrites the first. I now understand why it's doing that but I'm struggling to find a solution to my problem.
Any suggestions?
CodePudding user response:
You could use numpy.select
(since the words in df
are capitalized but those in the lists aren't, we could align them with the str.lower
method):
colors = df['color'].str.lower()
df['category'] = np.select([colors.isin(primary), colors.isin(secondary)], ['primary', 'secondary'], np.nan)
Output:
color category
0 Red primary
1 Yellow primary
2 Green secondary
3 Yellow primary
4 Violet secondary
CodePudding user response:
With a lambda
function
df['type'] = df.color.apply(lambda col : 'primary' if col.lower() in primary else ('secondary' if col.lower() in secondary else ''))
CodePudding user response:
Try with np.where
:
import numpy as np
color["category"] = np.where(color["color"].str.lower().isin(primary),
"primary",
"secondary")
>>> color
color category
0 Red primary
1 Yellow primary
2 Green secondary
3 Yellow primary
4 Violet secondary
Alternatively, with pandas.where
:
color["category"] = "secondary"
color["category"] = (color["category"].where(~color["color"].str.lower().isin(primary))
.fillna("primary")
)