Below is the Data given.
year Company/Account Name participants
2019 Willis Towers Watson 418
2020 Willis Towers Watson 568
2021 Willis Towers Watson 57
2019 White Plains Hospital 1904
2020 White Plains Hospital 742
2021 White Plains Hospital 3710
2019 Westchester Flying Club 310
2020 Westchester Flying Club 463
2021 Westchester Flying Club 562
2019 Us Open 282
2020 Us Open 742
2021 Us Open 812
2019 Upledger Institute 625
2020 Upledger Institute 49
2021 Upledger Institute 152
Output Should be
Westchester Flying Club
Us Open
Above two company having increasing number of participants in last three years.
Data:
{'year': [2019, 2020, 2021, 2019, 2020, 2021, 2019, 2020,
2021, 2019, 2020, 2021, 2019, 2020, 2021],
'Company/Account Name': ['Willis Towers Watson', 'Willis Towers Watson',
'Willis Towers Watson', 'White Plains Hospital',
'White Plains Hospital', 'White Plains Hospital',
'Westchester Flying Club', 'Westchester Flying Club',
'Westchester Flying Club', 'Us Open', 'Us Open',
'Us Open', 'Upledger Institute', 'Upledger Institute',
'Upledger Institute'],
'participants': [418, 568, 57, 1904, 742, 3710, 310,
463, 562, 282, 742, 812, 625, 49, 152]}
CodePudding user response:
We could groupby
diff
gt
to get a boolean Series which could be returns True if participants increased compared to previous year, False otherwise. Then use groupby
all
to construct a mask to filter companies that experienced increasing participation every year.
Finally, use the mask to filter the relevant companies:
s = (df.groupby('Company/Account Name')['participants'].diff().fillna(0.1).gt(0)
.groupby(df['Company/Account Name']).all())
out = s.index[s].tolist()
Instead of doing 2 groupby
s we can also apply
a lambda that does the same job:
s = df.groupby('Company/Account Name')['participants'].apply(lambda x: x.diff().dropna().gt(0).all())
Output:
['Us Open', 'Westchester Flying Club']