Home > Software engineering >  Column to flag the most max row of a group
Column to flag the most max row of a group

Time:11-21

With this code:

data = pd.DataFrame({'car_plate':['1v','2f','3a','h0','m1'],
'car_brand':['Honda','Honda','Ford','Audi','Audi'],
'last_seen':['01/01/2020','01/02/2020','01/03/2020','01/04/2020','01/05/2020']})

data['last_seen'] = pd.to_datetime(data['last_seen'])

I have the following dataframe:

car_plate car_brand last_seen
0 1v Honda 2020-01-01 00:00:00
1 2f Honda 2020-01-02 00:00:00
2 3a Ford 2020-01-03 00:00:00
3 h0 Audi 2020-01-04 00:00:00
4 m1 Audi 2020-01-05 00:00:00

I would like that for ever car_brand, I mark to 1 the row that have the most recent last_seen, so the output is:

car_plate car_brand last_seen recent
0 1v Honda 2020-01-01 00:00:00 0
1 2f Honda 2020-01-02 00:00:00 1
2 3a Ford 2020-01-03 00:00:00 1
3 h0 Audi 2020-01-04 00:00:00 0
4 m1 Audi 2020-01-05 00:00:00 1

As the 2f is the most recent Honda, 3a is the most recent Ford, and the m1 is the most recent Audi.

I've been thinking of this idea, pseudo code: group by car_brand, get max last_seen, get the index of that row, create a column seen and set to 1 those indexes.

P.S.: I'm also wondering how it would work if I add more levels to the combintion, e.g car_owner, and I would like the most recent car_plate car_owner combination for every car_brand.

CodePudding user response:

data['recent'] = data.index.isin(data.groupby('car_brand')['last_seen'].idxmax()).astype(int)

To answer your question about adding more levels to the combination: yes, you can. Just change 'car_brand' in the above code to ['your', 'columns', 'here'], e.g.

data['recent'] = data.index.isin(data.groupby(['car_brand', 'car_plate', 'car_owner'])['last_seen'].idxmax()).astype(int)

CodePudding user response:

One option is to create a temporary column, containing the last row per group, and then use a boolean mask to assign the 0s and 1s:

last = data.groupby('car_brand').last_seen.transform('last')
data.assign(recent = data.last_seen.eq(last).astype(int))

  car_plate car_brand  last_seen  recent
0        1v     Honda 2020-01-01       0
1        2f     Honda 2020-01-02       1
2        3a      Ford 2020-01-03       1
3        h0      Audi 2020-01-04       0
4        m1      Audi 2020-01-05       1

  • Related