With this code:
data = pd.DataFrame({'car_plate':['1v','2f','3a','h0','m1'],
'car_brand':['Honda','Honda','Ford','Audi','Audi'],
'last_seen':['01/01/2020','01/02/2020','01/03/2020','01/04/2020','01/05/2020']})
data['last_seen'] = pd.to_datetime(data['last_seen'])
I have the following dataframe:
car_plate | car_brand | last_seen | |
---|---|---|---|
0 | 1v | Honda | 2020-01-01 00:00:00 |
1 | 2f | Honda | 2020-01-02 00:00:00 |
2 | 3a | Ford | 2020-01-03 00:00:00 |
3 | h0 | Audi | 2020-01-04 00:00:00 |
4 | m1 | Audi | 2020-01-05 00:00:00 |
I would like that for ever car_brand
, I mark to 1
the row that have the most recent last_seen
, so the output is:
car_plate | car_brand | last_seen | recent | |
---|---|---|---|---|
0 | 1v | Honda | 2020-01-01 00:00:00 | 0 |
1 | 2f | Honda | 2020-01-02 00:00:00 | 1 |
2 | 3a | Ford | 2020-01-03 00:00:00 | 1 |
3 | h0 | Audi | 2020-01-04 00:00:00 | 0 |
4 | m1 | Audi | 2020-01-05 00:00:00 | 1 |
As the 2f
is the most recent Honda, 3a
is the most recent Ford, and the m1
is the most recent Audi.
I've been thinking of this idea, pseudo code: group by car_brand
, get max last_seen
, get the index of that row, create a column seen
and set to 1 those indexes.
P.S.: I'm also wondering how it would work if I add more levels to the combintion, e.g car_owner
, and I would like the most recent car_plate
car_owner
combination for every car_brand
.
CodePudding user response:
data['recent'] = data.index.isin(data.groupby('car_brand')['last_seen'].idxmax()).astype(int)
To answer your question about adding more levels to the combination: yes, you can. Just change 'car_brand'
in the above code to ['your', 'columns', 'here']
, e.g.
data['recent'] = data.index.isin(data.groupby(['car_brand', 'car_plate', 'car_owner'])['last_seen'].idxmax()).astype(int)
CodePudding user response:
One option is to create a temporary column, containing the last row per group, and then use a boolean mask to assign the 0s and 1s:
last = data.groupby('car_brand').last_seen.transform('last')
data.assign(recent = data.last_seen.eq(last).astype(int))
car_plate car_brand last_seen recent
0 1v Honda 2020-01-01 0
1 2f Honda 2020-01-02 1
2 3a Ford 2020-01-03 1
3 h0 Audi 2020-01-04 0
4 m1 Audi 2020-01-05 1