I have a dataset like these
Name | Position |
---|---|
Simon | Chef |
David | Assistant Chef |
Philipp | Collegial Chef |
Rafael | Assistant Chef |
Gregory | Collegial Chef |
Dani | Assistant Fellow |
Diego | Collegial Fellow |
Patrick | Assistant Fellow |
Robert | Collegial Fellow |
Glen | Chef |
Michael | Assistant Chef |
William | Collegial Chef |
Carter | Assistant Chef |
Anthony | Collegial Chef |
Timothy | Assistant Fellow |
Jeremy | Collegial Fellow |
Brian | Assistant Fellow |
Ceasar | Collegial Fellow |
Now I would like to add an additional column based on the values of the Position column.
I would like to create a column called role and this column has only 3 values (Chef, Assistant and Collegial)
Everything in the position who is only chef will have in the role the name one. Everything in the position who is something with collegial will have in the role the name collegial. Everything in the position who is something with assistant will have in the role the name assistant.
I am expecting these output
Name | Position | Role |
---|---|---|
Simon | Chef | One |
David | Assistant Chef | Assistant |
Philipp | Collegial Chef | Collegial |
Rafael | Assistant Chef | Assistant |
Gregory | Collegial Chef | Collegial |
Dani | Assistant Fellow | Assistant |
Diego | Collegial Fellow | Collegial |
Patrick | Assistant Fellow | Assistant |
Robert | Collegial Fellow | Collegial |
Glen | Chef | One |
Michael | Assistant Chef | Assistant |
William | Collegial Chef | Collegial |
Carter | Assistant Chef | Assistant |
Anthony | Collegial Chef | Collegial |
Timothy | Assistant Fellow | Assistant |
Jeremy | Collegial Fellow | Collegial |
Brian | Assistant Fellow | Assistant |
Ceasar | Collegial Fellow | Collegial |
I don't know how to code it? Could someone please help. The problem is some names in the column have also chef.
df['Role'] = np.where(df['Position'] == ["Chef"], 'Chef', 'Non Chef')
CodePudding user response:
Option 1
Use Series.str.split
with expand
parameter set to True
. Select the first column from the result and chain Series.replace
to replace "Chef" with "One".
import pandas as pd
data = {'Name': {0: 'Simon', 1: 'David', 2: 'Philipp', 3: 'Rafael',
4: 'Gregory', 5: 'Dani', 6: 'Diego', 7: 'Patrick',
8: 'Robert', 9: 'Glen', 10: 'Michael', 11: 'William',
12: 'Carter', 13: 'Anthony', 14: 'Timothy', 15: 'Jeremy',
16: 'Brian', 17: 'Ceasar'},
'Position': {0: 'Chef', 1: 'Assistant Chef', 2: 'Collegial Chef',
3: 'Assistant Chef', 4: 'Collegial Chef',
5: 'Assistant Fellow', 6: 'Collegial Fellow',
7: 'Assistant Fellow', 8: 'Collegial Fellow',
9: 'Chef', 10: 'Assistant Chef', 11: 'Collegial Chef',
12: 'Assistant Chef', 13: 'Collegial Chef',
14: 'Assistant Fellow', 15: 'Collegial Fellow',
16: 'Assistant Fellow', 17: 'Collegial Fellow'}}
df = pd.DataFrame(data)
df['Role'] = df.Position.str.split(' ', expand=True)[0].replace('Chef','One')
print(df)
Name Position Role
0 Simon Chef One
1 David Assistant Chef Assistant
2 Philipp Collegial Chef Collegial
3 Rafael Assistant Chef Assistant
4 Gregory Collegial Chef Collegial
5 Dani Assistant Fellow Assistant
6 Diego Collegial Fellow Collegial
7 Patrick Assistant Fellow Assistant
8 Robert Collegial Fellow Collegial
9 Glen Chef One
10 Michael Assistant Chef Assistant
11 William Collegial Chef Collegial
12 Carter Assistant Chef Assistant
13 Anthony Collegial Chef Collegial
14 Timothy Assistant Fellow Assistant
15 Jeremy Collegial Fellow Collegial
16 Brian Assistant Fellow Assistant
17 Ceasar Collegial Fellow Collegial
Option 2
Use Series.str.extract
with a regex pattern, and chain Series.fillna
with 'One'.
df['Role'] = df.Position.str.extract(r'(.*)(?=\s\w $)').fillna('One')
# capturing everything followed by a whitespace any word char until end of string
print(df)
# same result