Home > Back-end >  Adding values in a new column conditionally in pandas dataframe
Adding values in a new column conditionally in pandas dataframe

Time:02-12

I have a DataFrame like below:

ds = pd.DataFrame({'Name' : ['A','A','B','B','C', 'C', 'C', 'C'], 
    'Year': ['2021','2020','2020','2019','2021','2020','2020','2019' ]})

I want to add a new column 'Breached'. The value of "Breached" for column Name 'A' should be 1 if the year is less than the year of previous records of A and set to '0' otherwise. Similarly, the value of "Breached" for column Name 'B' should be 1 if the year is less than the year of previous records of B and set to '0' otherwise. Same goes for C.

So, my output should look like :

enter image description here

Here, the value of breached again is 0 only if the year was 2021 because that is the latest year for the name 'C'.

Any suggestion on how can I do that?

CodePudding user response:

Seems like you could groupby transform max ne to get a boolean Series that is True if the year is the not latest year for each group, False otherwise. Then convert this Series to int dtype:

ds['Breached'] = ds.groupby('Name')['Year'].transform('max').ne(ds['Year']).astype(int)

Output:

  Name  Year  Breached
0    A  2021         0
1    A  2020         1
2    B  2020         0
3    B  2019         1
4    C  2021         0
5    C  2020         1
6    C  2020         1
7    C  2019         1
  • Related