Pandas: How to do groupby with conditions in pandas dataframe?-CodePudding

I should do groupby(country and product) and Value column should contain count(id) where status is closed and I need to return all the remaining columns.

Sample input

id        status    ticket_time           product      country    name
126       open      2021-10-04 01:20:00   Broad        A          metric
299       open      2021-10-02 00:00:00   Fixed        B          metric
376       closed    2021-10-01 00:00:00   Fixed        C          metric
370       closed    2021-10-04 00:00:00   Broad        C          metric
372       closed    2021-10-04 00:00:00   TV           D          metric
605       closed    2021-10-01 00:00:00   TV           D          metric

OUTPUT FORMAT SAMPLE

country  product   name      ticket_time            Value(count(id)where status closed)
D        TV        metric    YYYY-MM-DD HH:MM:SS          2
C        Broad     metric    YYYY-MM-DD HH:MM:SS          1
C        Fixed     metric    YYYY-MM-DD HH:MM:SS          1
....      ...      ....                  ...               ...

I tried the below code:


df1 = df[df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')

df = df1.drop_duplicates(['country', 'product']).drop('status',axis=1).drop(['id'], axis = 1)

Is there any better way to approach this?

CodePudding user response：

Don't use groupby transform but groupby agg:

(df.loc[df['status'].eq('closed')]
   .groupby(['country', 'product'], as_index=False)
   .agg({'name': 'first', 'ticket_time': 'first', 'status': 'size'})
   .rename(columns={'status': 'Value count(size)'})
)

output:

  country product    name          ticket_time  Value count(size)
0       C   Broad  metric  2021-10-04 00:00:00                  1
1       C   Fixed  metric  2021-10-01 00:00:00                  1
2       D      TV  metric  2021-10-04 00:00:00                  2