Home > OS >  Pandas: How To Select Neighboring Rows Based on Rank
Pandas: How To Select Neighboring Rows Based on Rank

Time:01-03

I have a pandas dataframe that looks like so (csv format, the first row is the header):

employee_id, workplace, rank
1, google, 1
1, facebook, 2
1, apple, 3
2, amazon, 1
2, google, 2
2, netflix, 3
  1. employee_id: represents a unique employee ID
  2. workplace: represents a workplace of the employee at some point in time
  3. rank: represents when the employee worked there - for example, "1" means that they currently work there. "2" means that it's their previous workplace. "3" is where they had worked before working at "2". etc.

My goal is to find out where did Google employees work prior to working at Google (the immediate previous workplace)?

For example, the output would be (csv format, the first row is the header):

first_previous_workplace_for_googlers
facebook
netflix

Any ideas?

CodePudding user response:

Assume your dataframe is already sorted by rank:

first_previous_workplace = lambda x: x.eq('google').shift(fill_value=False)
out = df[df.groupby('employee_id')['workplace'].apply(first_previous_workplace)]
print(out)

# Output
   employee_id workplace  rank
1            1  facebook     2
5            2   netflix     3
  • Related