Home > database >  For data get the Companies have increasing no of participants in last 3 years in Pandas
For data get the Companies have increasing no of participants in last 3 years in Pandas

Time:02-25

Below is the Data given.

year    Company/Account Name    participants 
2019    Willis Towers Watson    418
2020    Willis Towers Watson    568
2021    Willis Towers Watson    57
2019    White Plains Hospital   1904
2020    White Plains Hospital   742
2021    White Plains Hospital   3710
2019    Westchester Flying Club 310
2020    Westchester Flying Club 463
2021    Westchester Flying Club 562
2019    Us Open                 282
2020    Us Open                 742
2021    Us Open                 812
2019    Upledger Institute      625
2020    Upledger Institute      49
2021    Upledger Institute      152

Output Should be

  1. Westchester Flying Club

  2. Us Open

Above two company having increasing number of participants in last three years.

Data:

{'year': [2019, 2020, 2021, 2019, 2020, 2021, 2019, 2020, 
          2021, 2019, 2020, 2021, 2019, 2020, 2021],
 'Company/Account Name': ['Willis Towers Watson', 'Willis Towers Watson', 
                          'Willis Towers Watson', 'White Plains Hospital', 
                          'White Plains Hospital', 'White Plains Hospital',
                          'Westchester Flying Club', 'Westchester Flying Club', 
                          'Westchester Flying Club', 'Us Open', 'Us Open', 
                          'Us Open', 'Upledger Institute', 'Upledger Institute',
                          'Upledger Institute'],
 'participants': [418, 568, 57, 1904, 742, 3710, 310, 
                  463, 562, 282, 742, 812, 625, 49, 152]}

CodePudding user response:

We could groupby diff gt to get a boolean Series which could be returns True if participants increased compared to previous year, False otherwise. Then use groupby all to construct a mask to filter companies that experienced increasing participation every year.

Finally, use the mask to filter the relevant companies:

s = (df.groupby('Company/Account Name')['participants'].diff().fillna(0.1).gt(0)
     .groupby(df['Company/Account Name']).all())
out = s.index[s].tolist()

Instead of doing 2 groupbys we can also apply a lambda that does the same job:

s = df.groupby('Company/Account Name')['participants'].apply(lambda x: x.diff().dropna().gt(0).all())

Output:

['Us Open', 'Westchester Flying Club']
  • Related