I have a DataFrame like below:
ds = pd.DataFrame({'Name' : ['A','A','B','B','C', 'C', 'C', 'C'],
'Year': ['2021','2020','2020','2019','2021','2020','2020','2019' ]})
I want to add a new column 'Breached'. The value of "Breached" for column Name 'A' should be 1 if the year is less than the year of previous records of A and set to '0' otherwise. Similarly, the value of "Breached" for column Name 'B' should be 1 if the year is less than the year of previous records of B and set to '0' otherwise. Same goes for C.
So, my output should look like :
Here, the value of breached again is 0 only if the year was 2021 because that is the latest year for the name 'C'.
Any suggestion on how can I do that?
CodePudding user response:
Seems like you could groupby
transform max
ne
to get a boolean Series that is True if the year is the not latest year for each group, False otherwise. Then convert this Series to int dtype:
ds['Breached'] = ds.groupby('Name')['Year'].transform('max').ne(ds['Year']).astype(int)
Output:
Name Year Breached
0 A 2021 0
1 A 2020 1
2 B 2020 0
3 B 2019 1
4 C 2021 0
5 C 2020 1
6 C 2020 1
7 C 2019 1