Home > Mobile >  Add new column based on values of another column in python
Add new column based on values of another column in python

Time:11-02

I have a dataset like these

Name Position
Simon Chef
David Assistant Chef
Philipp Collegial Chef
Rafael Assistant Chef
Gregory Collegial Chef
Dani Assistant Fellow
Diego Collegial Fellow
Patrick Assistant Fellow
Robert Collegial Fellow
Glen Chef
Michael Assistant Chef
William Collegial Chef
Carter Assistant Chef
Anthony Collegial Chef
Timothy Assistant Fellow
Jeremy Collegial Fellow
Brian Assistant Fellow
Ceasar Collegial Fellow

Now I would like to add an additional column based on the values of the Position column.

I would like to create a column called role and this column has only 3 values (Chef, Assistant and Collegial)

Everything in the position who is only chef will have in the role the name one. Everything in the position who is something with collegial will have in the role the name collegial. Everything in the position who is something with assistant will have in the role the name assistant.

I am expecting these output

Name Position Role
Simon Chef One
David Assistant Chef Assistant
Philipp Collegial Chef Collegial
Rafael Assistant Chef Assistant
Gregory Collegial Chef Collegial
Dani Assistant Fellow Assistant
Diego Collegial Fellow Collegial
Patrick Assistant Fellow Assistant
Robert Collegial Fellow Collegial
Glen Chef One
Michael Assistant Chef Assistant
William Collegial Chef Collegial
Carter Assistant Chef Assistant
Anthony Collegial Chef Collegial
Timothy Assistant Fellow Assistant
Jeremy Collegial Fellow Collegial
Brian Assistant Fellow Assistant
Ceasar Collegial Fellow Collegial

I don't know how to code it? Could someone please help. The problem is some names in the column have also chef.

df['Role'] = np.where(df['Position'] == ["Chef"], 'Chef', 'Non Chef')

CodePudding user response:

Option 1

Use Series.str.split with expand parameter set to True. Select the first column from the result and chain Series.replace to replace "Chef" with "One".

import pandas as pd

data = {'Name': {0: 'Simon', 1: 'David', 2: 'Philipp', 3: 'Rafael', 
                 4: 'Gregory', 5: 'Dani', 6: 'Diego', 7: 'Patrick', 
                 8: 'Robert', 9: 'Glen', 10: 'Michael', 11: 'William', 
                 12: 'Carter', 13: 'Anthony', 14: 'Timothy', 15: 'Jeremy', 
                 16: 'Brian', 17: 'Ceasar'}, 
        'Position': {0: 'Chef', 1: 'Assistant Chef', 2: 'Collegial Chef', 
                     3: 'Assistant Chef', 4: 'Collegial Chef', 
                     5: 'Assistant Fellow', 6: 'Collegial Fellow', 
                     7: 'Assistant Fellow', 8: 'Collegial Fellow', 
                     9: 'Chef', 10: 'Assistant Chef', 11: 'Collegial Chef', 
                     12: 'Assistant Chef', 13: 'Collegial Chef', 
                     14: 'Assistant Fellow', 15: 'Collegial Fellow', 
                     16: 'Assistant Fellow', 17: 'Collegial Fellow'}}
df = pd.DataFrame(data)

df['Role'] = df.Position.str.split(' ', expand=True)[0].replace('Chef','One')

print(df)

       Name          Position       Role
0     Simon              Chef        One
1     David    Assistant Chef  Assistant
2   Philipp    Collegial Chef  Collegial
3    Rafael    Assistant Chef  Assistant
4   Gregory    Collegial Chef  Collegial
5      Dani  Assistant Fellow  Assistant
6     Diego  Collegial Fellow  Collegial
7   Patrick  Assistant Fellow  Assistant
8    Robert  Collegial Fellow  Collegial
9      Glen              Chef        One
10  Michael    Assistant Chef  Assistant
11  William    Collegial Chef  Collegial
12   Carter    Assistant Chef  Assistant
13  Anthony    Collegial Chef  Collegial
14  Timothy  Assistant Fellow  Assistant
15   Jeremy  Collegial Fellow  Collegial
16    Brian  Assistant Fellow  Assistant
17   Ceasar  Collegial Fellow  Collegial

Option 2

Use Series.str.extract with a regex pattern, and chain Series.fillna with 'One'.

df['Role'] = df.Position.str.extract(r'(.*)(?=\s\w $)').fillna('One')
# capturing everything followed by a whitespace   any word char until end of string
print(df)
# same result
  • Related