I have the following dataframe:
ID Type Job
1 Employee Doctor
2 Contingent Worker Doctor
3 Employee Employee
4 Employee Employee
5 Contingent Worker Employee
6 Contingent Worker Consultant
7 Contingent Worker Trainee
8 Contingent Worker SSS
9 Contingent Worker Agency Worker
10 Contingent Worker
And I have this list of possible acceptable values for everyone that has a type of Contingent Workers:
list = ['Agency Worker', 'Consultant']
I need to find a way to confirm if everyone under the type "Contingent Worker" have an accetpable value in "Job" and, if not (or blank value), replace that value for "Consultant" resulting in this dataframe:
ID Type Job
1 Employee Doctor
2 Contingent Worker Consultant
3 Employee Employee
4 Employee Employee
5 Contingent Worker Consultant
6 Contingent Worker Consultant
7 Contingent Worker Consultant
8 Contingent Worker Consultant
9 Contingent Worker Agency Worker
10 Contingent Worker Consultant
What would be the best way to achieve this result?
CodePudding user response:
I would do it following way
df.loc[(df.Type=='Contingent Worker') & ~df.Job.isin(['Agency Worker', 'Consultant']),'Job'] = 'Consultant'
print(df)
gives output
Type Job
ID
1 Employee Doctor
2 Contingent Worker Consultant
3 Employee Employee
4 Employee Employee
5 Contingent Worker Consultant
6 Contingent Worker Consultant
7 Contingent Worker Consultant
8 Contingent Worker Consultant
9 Contingent Worker Agency Worker
10 Contingent Worker Consultant
Explanation: select such rows where Type is Contingent Worker
and (&
) Job is not (~
) one of values (isin
) from your list, select Job column, set value to Consultant
.
CodePudding user response:
You could np.where
for this.
- Select only the slice of relevant entries for column
Job
(so:df['Type']=='Contingent Worker'
insidedf.loc
) and useSeries.isin
to check for each string whether it is found inside your list (here:jobs
). If so, we return the associated value, else, we return "Consultant".
jobs = ['Agency Worker', 'Consultant']
df.loc[df['Type']=='Contingent Worker','Job'] = np.where(
df.loc[df['Type']=='Contingent Worker','Job'].isin(jobs),
df.loc[df['Type']=='Contingent Worker','Job'],
'Consultant')
print(df)
ID Type Job
0 1 Employee Doctor
1 2 Contingent Worker Consultant
2 3 Employee Employee
3 4 Employee Employee
4 5 Contingent Worker Consultant
5 6 Contingent Worker Consultant
6 7 Contingent Worker Consultant
7 8 Contingent Worker Consultant
8 9 Contingent Worker Agency Worker
9 10 Contingent Worker Consultant
N.B. Please don't use "list" as a variable name. Since list
is a built-in data type in Python, doing so will overwrite its functionality. E.g.:
print(type(list))
<class 'type'>
list = ['Agency Worker', 'Consultant']
<class 'list'>
lst = list()
# will now throw an error:
TypeError: 'list' object is not callable