Here is a simplified version of the DF in question:
df = pd.DataFrame({'type': ['terrier', 'toy','toy','toy', 'hound' , 'terrier',
'terrier', 'terrier','terrier', 'hound'],
'breed' : ['yorkshire_terrier', 'king_charles_spaniel', 'poodle', 'shih_tzu',
'greyhound', 'west_highland', 'bull_terrier' , 'fox_terrier',
'west_highland', 'afghan']})
df
type breed
0 terrier yorkshire_terrier
1 toy king_charles_spaniel
2 toy poodle
3 toy shih_tzu
4 hound greyhound
5 terrier west_highland
6 terrier bull_terrier
7 terrier fox_terrier
8 terrier west_highland
9 hound afghan
I would like to create a function, which takes in consideration both the type and breed of each dog and assigns it a colour based on the rules as per these dictionaries:
toy = {'black' : ['poodle', 'shih_tzu'],
'mixed' : 'king_charles_spaniel'}
terrier = {'black_brown' : ['yorkshire_terrier','bull_terrier'],
'white' : 'west_highland',
'white_orange' : 'fox_terrier'}
hound = {'brindle' : 'greyhound',
'brown' : 'afghan'}
Intended DF below:
type breed colour
0 terrier yorkshire_terrier black_brown
1 toy king_charles_spaniel mixed
2 toy poodle black
3 toy shih_tzu black
4 hound greyhound brindle
5 terrier west_highland white
6 terrier bull_terrier black_brown
7 terrier fox_terrier white_orange
8 terrier west_highland white
9 hound afghan brown
Please note that I would like the solution to be in the form of a function so I am able to apply the same solution to other DFs of a similar nature.
Please also note that, although regretfully not at this time expressed in the example, it is important to take into consideration both type and breed to determine colour.
CodePudding user response:
I was able to get the outcome you were looking for by first creating a function to retrieve the key.
def get_key(dog_type, val):
d = {'toy': toy, 'terrier':terrier, 'hound':hound}
my_dict = d[dog_type]
for key in my_dict.keys():
if val in my_dict[key]:
return key
else:
None
Then applying that function row-wise to your dataframe into a new column called colour
.
df['colour'] = df.apply(lambda row: get_key(row['type'], row['breed']), axis=1)
Note: In your given dictionary west highland
doesn't have an underscore so if there are any breed
entries with west_highland
it will return None
.
CodePudding user response:
I think there is a typo in your terrier
dict ( missing an underscore in a breed).
After this change, this should work:
def colours(x):
for dog in [hound,toy,terrier]:
for colour in dog:
if x in dog[colour]:
return colour
df['colour']=df['breed'].map(colours)
If you had a dictionnary colours
, linking a breed (keys) to its related colour, you could simply apply:
df['colour']=df['breed'].map(colours)