I have a pandas dataframe that looks like so (csv
format, the first row is the header):
employee_id, workplace, rank
1, google, 1
1, facebook, 2
1, apple, 3
2, amazon, 1
2, google, 2
2, netflix, 3
employee_id
: represents a unique employee IDworkplace
: represents a workplace of the employee at some point in timerank
: represents when the employee worked there - for example, "1" means that they currently work there. "2" means that it's their previous workplace. "3" is where they had worked before working at "2". etc.
My goal is to find out where did Google employees work prior to working at Google (the immediate previous workplace)?
For example, the output would be (csv
format, the first row is the header):
first_previous_workplace_for_googlers
facebook
netflix
Any ideas?
CodePudding user response:
Assume your dataframe is already sorted by rank
:
first_previous_workplace = lambda x: x.eq('google').shift(fill_value=False)
out = df[df.groupby('employee_id')['workplace'].apply(first_previous_workplace)]
print(out)
# Output
employee_id workplace rank
1 1 facebook 2
5 2 netflix 3