I am trying to carry out what should be a pretty simple procedure in Python, but I am having trouble searching for help on this, because I don't know how to best put what I am trying to do into searchable words. I am not sure if what I am trying to do is called reclassifying or using a conditional statement or what really. I will show an example of what I am trying to do, which is pretty simple I think. I have the following DataFrame:
Color Value
----------------
blue 43
blue 53
blue 25
orange 44
orange 33
orange 35
red 66
red 43
red 65
green 44
green 35
green 24
green 34
Now, what I want to do is categorize these colors based on whether they are primary colors or secondary colors, where of course, blue, and red are primary colors, and orange, and green are secondary colors. And so I want to create the following DataFrame:
Color Value Category
------------------------------
blue 43 Primary
blue 53 Primary
blue 25 Primary
orange 44 Secondary
orange 33 Secondary
orange 35 Secondary
red 66 Primary
red 43 Primary
red 65 Primary
green 44 Secondary
green 35 Secondary
green 24 Secondary
green 34 Secondary
I am not sure if this involve needing to create a dictionary or if I just use a simple conditional statement to apply to my DataFrame. How can this be done in Python?
CodePudding user response:
You can use simple np.where
:
df['Category'] = np.where(df['Color'].str.contains('blue|red'), 'Primary', 'Seconday')
or
df['Color'].str.contains('blue|red').map({True:'Primary',False:'Secondary'})
CodePudding user response:
Assuming we're looking to categorize all colours which fall into these categories the easiest way is to establish a mapping:
colors = {
'Primary': ['red', 'blue', 'yellow'],
'Secondary': ['orange', 'purple', 'green']
}
*Note the dictionary is built this way for convince as it assumes there are more colours then Categories.
We can then reformat it into a valid mapper for Series.map
with a dictionary comprehension:
color_map = {k: v for v, lst in colors.items() for k in lst}
df['Category'] = df['Color'].map(color_map)
df
:
Color Value Category
0 blue 43 Primary
1 blue 53 Primary
2 blue 25 Primary
3 orange 44 Secondary
4 orange 33 Secondary
5 orange 35 Secondary
6 red 66 Primary
7 red 43 Primary
8 red 65 Primary
9 green 44 Secondary
10 green 35 Secondary
11 green 24 Secondary
12 green 34 Secondary
color_map
for reference (this is the way the dictionary needs to be formatted to work with Series.map
however it is less human readable then the colors
dictionary's format):
{'red': 'Primary', 'blue': 'Primary', 'yellow': 'Primary',
'orange': 'Secondary', 'purple': 'Secondary', 'green': 'Secondary'}
We can also chain a str.lower
if we expect mixed casing in the Color
column:
df['Category'] = df['Color'].str.lower().map(color_map)
Setup and imports:
import pandas as pd
df = pd.DataFrame({
'Color': ['blue', 'blue', 'blue', 'orange', 'orange', 'orange', 'red',
'red', 'red', 'green', 'green', 'green', 'green'],
'Value': [43, 53, 25, 44, 33, 35, 66, 43, 65, 44, 35, 24, 34]
})