Pandas: How To Select Neighboring Rows Based on Rank-CodePudding

I have a pandas dataframe that looks like so (csv format, the first row is the header):

employee_id, workplace, rank
1, google, 1
1, facebook, 2
1, apple, 3
2, amazon, 1
2, google, 2
2, netflix, 3

employee_id: represents a unique employee ID
workplace: represents a workplace of the employee at some point in time
rank: represents when the employee worked there - for example, "1" means that they currently work there. "2" means that it's their previous workplace. "3" is where they had worked before working at "2". etc.

My goal is to find out where did Google employees work prior to working at Google (the immediate previous workplace)?

For example, the output would be (csv format, the first row is the header):

first_previous_workplace_for_googlers
facebook
netflix

Any ideas?

CodePudding user response：

Assume your dataframe is already sorted by rank:

first_previous_workplace = lambda x: x.eq('google').shift(fill_value=False)
out = df[df.groupby('employee_id')['workplace'].apply(first_previous_workplace)]
print(out)

# Output
   employee_id workplace  rank
1            1  facebook     2
5            2   netflix     3