How to filter the groupby dataframe in a multiindexed data-CodePudding

I have a data frame like this:

df:
ID  Group   Score1
1     A       45       
2     A       12
3     B       33
3     C       64
.     .       .
.     .       .
15000 .       .

There are thousands of IDs that are repeated and 15 groups in total. For every group, I want to display

Number of unique IDs greater than or equal to Median Score
Number of unique IDs greater than Score 40
Number of unique IDs below the first quartile(0-25th percentile range) of the Score column
Number of unique IDs above the third quartile(75 - 100th percentile range) of the Score column

So my output looks like this: Output:

Group  No of unique  No of unique  No of unique      No of unique
      IDs >= Median  IDs > 40       IDs below         IDs above the
                                  first quartile      third quartile
 A         56           25             17                 45
 B         33           78             28                 62
 C         23           36             38                 33
 .
 .
 P         39           28             26                 41

Can anyone help me with this? Many Thanks!

CodePudding user response：

Hope this works. First I created the four corresponding columns first in the original dataframe. Then create df1 which is the dataframe df with unique ID's. Finally, using pd.groupby() you can create the desired output:

df = pd.DataFrame({'ID': list(range(100)),
                   'Group': list('ABCDE')*20,
                   'Score1': np.random.randint(10, 99, 100)})

df['No of unique IDs >= Median'] = df.Score1 >= df.Score1.median()
df['No of unique IDs > 40'] = df.Score1 > 40
df['No of unique IDs < first quartile'] = df.Score1 < df.Score1.quantile(0.25)
df['No of unique IDs > third quartile'] = df.Score1 > df.Score1.quantile(0.75)

df1 = df.groupby(['ID', 'Group']).any().reset_index()

df1.groupby('Group').agg({'No of unique IDs >= Median': 'sum',
                          'No of unique IDs > 40': 'sum',
                          'No of unique IDs < first quartile': 'sum',
                          'No of unique IDs > third quartile': 'sum'})