I should do groupby(country and product) and Value column should contain count(id) where status is closed and I need to return all the remaining columns.
Sample input
id status ticket_time product country name
126 open 2021-10-04 01:20:00 Broad A metric
299 open 2021-10-02 00:00:00 Fixed B metric
376 closed 2021-10-01 00:00:00 Fixed C metric
370 closed 2021-10-04 00:00:00 Broad C metric
372 closed 2021-10-04 00:00:00 TV D metric
605 closed 2021-10-01 00:00:00 TV D metric
OUTPUT FORMAT SAMPLE
country product name ticket_time Value(count(id)where status closed)
D TV metric YYYY-MM-DD HH:MM:SS 2
C Broad metric YYYY-MM-DD HH:MM:SS 1
C Fixed metric YYYY-MM-DD HH:MM:SS 1
.... ... .... ... ...
I tried the below code:
df1 = df[df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')
df = df1.drop_duplicates(['country', 'product']).drop('status',axis=1).drop(['id'], axis = 1)
Is there any better way to approach this?
CodePudding user response:
Don't use groupby
transform
but groupby
agg
:
(df.loc[df['status'].eq('closed')]
.groupby(['country', 'product'], as_index=False)
.agg({'name': 'first', 'ticket_time': 'first', 'status': 'size'})
.rename(columns={'status': 'Value count(size)'})
)
output:
country product name ticket_time Value count(size)
0 C Broad metric 2021-10-04 00:00:00 1
1 C Fixed metric 2021-10-01 00:00:00 1
2 D TV metric 2021-10-04 00:00:00 2