I need only those company name with year and estimate grp size those are present in all three year eg 2019, 2020, 2021
year Company/Account Name EstimatedGroupSize
0 2019 Unknown 19550
1 2019 Mayo Clinic 7754
2 2019 Deloitte 6432
3 2019 Rizona State 5582
4 2019 Intel Corporation 4595
5 2020 Deloitte 4063
6 2020 Unknown 3490
7 2021 Unknown 3484
8 2020 Intel Corporation 3460
9 2021 Intel Corporation 3433
10 2021 Deloitte 3250
So my output should be
year Company/Account Name EstimatedGroupSize
0 2019 Unknown 19550
2 2019 Deloitte 6432
4 2019 Intel Corporation 4595
5 2020 Deloitte 4063
6 2020 Unknown 3490
7 2021 Unknown 3484
8 2020 Intel Corporation 3460
9 2021 Intel Corporation 3433
10 2021 Deloitte 3250
CodePudding user response:
Here is solution for filter year
with Company/Account Name
if present at least one row and filter original DataFrame by inner merge
:
#if need filter ony some years first
df = df[df['year'].isin([2019, 2020, 2021])]
df1 = pd.crosstab(df['year'], df['Company/Account Name'])
df = df.merge(df1.loc[:, df1.gt(0).all()].stack().index.to_frame(index=False))
print (df)
year Company/Account Name EstimatedGroupSize
0 2019 Unknown 19550
1 2019 Deloitte 6432
2 2019 Intel Corporation 4595
3 2020 Deloitte 4063
4 2020 Unknown 3490
5 2021 Unknown 3484
6 2020 Intel Corporation 3460
7 2021 Intel Corporation 3433
8 2021 Deloitte 3250
CodePudding user response:
IIUC,
years = [2019, 2020, 2021]
new_df = \
df.loc[pd.get_dummies(df['year'])
.groupby(df['Company/Account Name'])[years]
.transform('sum')
.gt(0)
.all(axis=1)]
print(new_df)
year Company/Account Name EstimatedGroupSize
0 2019 Unknown 19550
2 2019 Deloitte 6432
4 2019 Intel-Corporation 4595
5 2020 Deloitte 4063
6 2020 Unknown 3490
7 2021 Unknown 3484
8 2020 Intel-Corporation 3460
9 2021 Intel-Corporation 3433
10 2021 Deloitte 3250
Or:
years = [2019, 2020, 2021]
new_df = \
df.groupby('Company/Account Name')\
.filter(lambda x: np.isin(years, x['year']).all())