I have a CSV with some information about products color. Since sometimes some extra details are there, I would like to extract just the color name. I found out some library but my data are in french so they dont fit those. I try to do it with Python.
From "transparent blue" I want to just keep "blue"
The table is like :
Product ref | Color | Sales quantity |
---|---|---|
F33 | Bleu transparent | 2 |
K367 | Ecaille Marron | 1 |
And I am looking to take the "Bleu" (Blue) and "Marron" (brown) to see which colors are the more sale
CodePudding user response:
You could create a translator function and then apply
this to the column.
here is an example (using the data in the question).
import pandas as pd
# original dataframe
data = {'Product ref': ['F33', 'K367'],
'Color': ['Bleu transparent', 'Ecaille Marron'],
'Sales quantity': [2, 1]}
df = pd.DataFrame(data)
def translate(french):
''' translating function '''
if 'Bleu' in french:
return 'blue'
if 'Marron' in french:
return 'brown'
return '-'
# apply the result
df['english'] = df['Color'].apply(translate)
print(df)
This is the result:
Product ref Color Sales quantity english
0 F33 Bleu transparent 2 blue
1 K367 Ecaille Marron 1 brown
Note:
You could use a much more sophistocated translating and matching function (for example googletrans
). The example above is a working example.