Add new column based on values of another column in python-CodePudding

I have a dataset like these

Name	Position
Simon	Chef
David	Assistant Chef
Philipp	Collegial Chef
Rafael	Assistant Chef
Gregory	Collegial Chef
Dani	Assistant Fellow
Diego	Collegial Fellow
Patrick	Assistant Fellow
Robert	Collegial Fellow
Glen	Chef
Michael	Assistant Chef
William	Collegial Chef
Carter	Assistant Chef
Anthony	Collegial Chef
Timothy	Assistant Fellow
Jeremy	Collegial Fellow
Brian	Assistant Fellow
Ceasar	Collegial Fellow

Now I would like to add an additional column based on the values of the Position column.

I would like to create a column called role and this column has only 3 values (Chef, Assistant and Collegial)

Everything in the position who is only chef will have in the role the name one. Everything in the position who is something with collegial will have in the role the name collegial. Everything in the position who is something with assistant will have in the role the name assistant.

I am expecting these output

Name	Position	Role
Simon	Chef	One
David	Assistant Chef	Assistant
Philipp	Collegial Chef	Collegial
Rafael	Assistant Chef	Assistant
Gregory	Collegial Chef	Collegial
Dani	Assistant Fellow	Assistant
Diego	Collegial Fellow	Collegial
Patrick	Assistant Fellow	Assistant
Robert	Collegial Fellow	Collegial
Glen	Chef	One
Michael	Assistant Chef	Assistant
William	Collegial Chef	Collegial
Carter	Assistant Chef	Assistant
Anthony	Collegial Chef	Collegial
Timothy	Assistant Fellow	Assistant
Jeremy	Collegial Fellow	Collegial
Brian	Assistant Fellow	Assistant
Ceasar	Collegial Fellow	Collegial

I don't know how to code it? Could someone please help. The problem is some names in the column have also chef.

df['Role'] = np.where(df['Position'] == ["Chef"], 'Chef', 'Non Chef')

CodePudding user response：

Option 1

Use Series.str.split with expand parameter set to True. Select the first column from the result and chain Series.replace to replace "Chef" with "One".

import pandas as pd

data = {'Name': {0: 'Simon', 1: 'David', 2: 'Philipp', 3: 'Rafael', 
                 4: 'Gregory', 5: 'Dani', 6: 'Diego', 7: 'Patrick', 
                 8: 'Robert', 9: 'Glen', 10: 'Michael', 11: 'William', 
                 12: 'Carter', 13: 'Anthony', 14: 'Timothy', 15: 'Jeremy', 
                 16: 'Brian', 17: 'Ceasar'}, 
        'Position': {0: 'Chef', 1: 'Assistant Chef', 2: 'Collegial Chef', 
                     3: 'Assistant Chef', 4: 'Collegial Chef', 
                     5: 'Assistant Fellow', 6: 'Collegial Fellow', 
                     7: 'Assistant Fellow', 8: 'Collegial Fellow', 
                     9: 'Chef', 10: 'Assistant Chef', 11: 'Collegial Chef', 
                     12: 'Assistant Chef', 13: 'Collegial Chef', 
                     14: 'Assistant Fellow', 15: 'Collegial Fellow', 
                     16: 'Assistant Fellow', 17: 'Collegial Fellow'}}
df = pd.DataFrame(data)

df['Role'] = df.Position.str.split(' ', expand=True)[0].replace('Chef','One')

print(df)

       Name          Position       Role
0     Simon              Chef        One
1     David    Assistant Chef  Assistant
2   Philipp    Collegial Chef  Collegial
3    Rafael    Assistant Chef  Assistant
4   Gregory    Collegial Chef  Collegial
5      Dani  Assistant Fellow  Assistant
6     Diego  Collegial Fellow  Collegial
7   Patrick  Assistant Fellow  Assistant
8    Robert  Collegial Fellow  Collegial
9      Glen              Chef        One
10  Michael    Assistant Chef  Assistant
11  William    Collegial Chef  Collegial
12   Carter    Assistant Chef  Assistant
13  Anthony    Collegial Chef  Collegial
14  Timothy  Assistant Fellow  Assistant
15   Jeremy  Collegial Fellow  Collegial
16    Brian  Assistant Fellow  Assistant
17   Ceasar  Collegial Fellow  Collegial

Option 2

Use Series.str.extract with a regex pattern, and chain Series.fillna with 'One'.

df['Role'] = df.Position.str.extract(r'(.*)(?=\s\w $)').fillna('One')
# capturing everything followed by a whitespace   any word char until end of string
print(df)
# same result