I have dictionary where key is color_name and value is list of color_name similar to mentioned as key color
all_colors = {
'red': ['coral','burgundy'],
'yellow':['mustard','lemon']}
I have pandas dataframe
import pandas as pd
df = pd.DataFrame(
{'market_color': ['red',
'coral',
'burgundy',
'light red',
'mustard',
'lemon',
'red'],
'color_id': [1, 2, 3, 4, 5, 6, 7]})
I want to count how much time color_name from all_colors and it's similarities mentioned in dataframe market_color column.
Expecting final dictionary like this all_colors_frequencies={'red':5,'yellow':2}
How i can achive it
CodePudding user response:
You can define a function that iterates through the map and tries to match the value to one of the keys
def categorize(col, map):
result = "Unknown"
for key, color_list in map.items():
if col == key or col in color_list:
return key
return result
Then you apply that function to the col market_color
and use value_counts to get the final count for each key
df.market_color.apply(lambda col: categorize(col, all_colors)).value_counts()
The following snippet:
all_colors={'red':['coral','burgundy','light red'], 'yellow':['mustard','lemon']}
df={
'market_color':['red','coral','burgundy','light red','mustard','lemon','red'],
'color_id':[1,2,3,4,5,6,7]
}
df = pd.DataFrame(df)
def categorize(col, map):
result = "Unknown"
for key, color_list in map.items():
if col == key or col in color_list:
return key
return result
print(df.market_color.apply(lambda col: categorize(col, all_colors)).value_counts())
Would give the following output:
red 5
yellow 2
Name: market_color, dtype: int64
CodePudding user response:
One approach using str.replace
and str.extract
:
reverse_lookup = {v: k for k, vs in all_colors.items() for v in vs}
def repl(m):
return reverse_lookup[m.group()]
# map similar colors to key colors
normal = df["market_color"].str.replace("|".join(reverse_lookup), repl=repl, regex=True)
# extract only colors, i.e. light red -> red
colors_only = normal.str.extract(f'({"|".join(all_colors)})', expand=False)
# count and transform to dict
res = colors_only.value_counts().to_dict()
print(res)
Output
{'red': 5, 'yellow': 2}