Home > Mobile >  Creating a function which creates a new column based on the values of other columns in a dataframe
Creating a function which creates a new column based on the values of other columns in a dataframe

Time:09-28

Here is a simplified version of the DF in question:

  df = pd.DataFrame({'type': ['terrier', 'toy','toy','toy', 'hound' , 'terrier', 
                              'terrier', 'terrier','terrier', 'hound'],
                     'breed' : ['yorkshire_terrier', 'king_charles_spaniel', 'poodle', 'shih_tzu',
                                'greyhound', 'west_highland', 'bull_terrier' , 'fox_terrier', 
                                'west_highland', 'afghan']})

  df

     type             breed
0   terrier          yorkshire_terrier
1   toy              king_charles_spaniel
2   toy              poodle
3   toy              shih_tzu
4   hound            greyhound
5   terrier          west_highland
6   terrier          bull_terrier
7   terrier          fox_terrier
8   terrier          west_highland
9   hound            afghan

I would like to create a function, which takes in consideration both the type and breed of each dog and assigns it a colour based on the rules as per these dictionaries:

toy = {'black' : ['poodle', 'shih_tzu'], 
       'mixed' : 'king_charles_spaniel'}

terrier = {'black_brown' : ['yorkshire_terrier','bull_terrier'],
           'white' : 'west_highland',
           'white_orange' : 'fox_terrier'}

hound = {'brindle' : 'greyhound',
           'brown' : 'afghan'}

Intended DF below:

    type            breed                colour
0   terrier        yorkshire_terrier     black_brown
1   toy            king_charles_spaniel  mixed
2   toy            poodle                black
3   toy            shih_tzu              black
4   hound          greyhound             brindle
5   terrier        west_highland         white
6   terrier        bull_terrier          black_brown
7   terrier        fox_terrier           white_orange
8   terrier        west_highland         white
9   hound          afghan                brown

Please note that I would like the solution to be in the form of a function so I am able to apply the same solution to other DFs of a similar nature.

Please also note that, although regretfully not at this time expressed in the example, it is important to take into consideration both type and breed to determine colour.

CodePudding user response:

I was able to get the outcome you were looking for by first creating a function to retrieve the key.

def get_key(dog_type, val):
    d = {'toy': toy, 'terrier':terrier, 'hound':hound}
    my_dict = d[dog_type]
    for key in my_dict.keys():
        if val in my_dict[key]:
            return key
        else:
            None

Then applying that function row-wise to your dataframe into a new column called colour.

df['colour'] = df.apply(lambda row: get_key(row['type'], row['breed']), axis=1)

Note: In your given dictionary west highland doesn't have an underscore so if there are any breed entries with west_highland it will return None.

CodePudding user response:

I think there is a typo in your terrier dict ( missing an underscore in a breed).

After this change, this should work:

def colours(x):
    for dog in [hound,toy,terrier]:
        for colour in dog:
            if x in dog[colour]:
                return colour

df['colour']=df['breed'].map(colours)

If you had a dictionnary colours, linking a breed (keys) to its related colour, you could simply apply:

df['colour']=df['breed'].map(colours)
  • Related