I have a dataframe like below
df = pd.DataFrame({'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2],
'time_1' :['2017-04-03 12:35:00','2017-04-03 12:50:00','2018-04-05 12:59:00','2018-05-04 13:14:00','2017-05-05 13:37:00','2018-07-06 13:39:00','2018-07-08 11:30:00','2017-04-08 16:00:00','2019-04-09 22:00:00','2019-04-11 04:00:00','2018-04-13 04:30:00','2017-04-14 08:00:00'],
'val' :[5,5,5,5,1,6,5,5,8,3,4,6],
'Prod_id':['A','B','C','A','E','Q','G','F','G','H','J','A']})
df['time_1'] = pd.to_datetime(df['time_1'])
I would like to do the below
a) groupby subject_id
and time_1
using freq=
3M`
b) return only the aggregated values of Prod_id
column (and drop index)
So, I tried the below
df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M')])['Prod_id'].nunique()
Though the above works but it returned the group by columns as well in the output.
So, I tried the below using as_index=False
df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M'),as_index=False])['Prod_id'].nunique()
But still it didn't give the exepected output
I expect my output to be like as shown below
uniq_prod_cnt
2
1
1
3
2
1
2
CodePudding user response:
You are in one of those cases in which you need to get rid of the index afterwards.
To get the exact shown output:
(df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M')])
.agg(uniq_prod_cnt=('Prod_id', 'nunique'))
.reset_index(drop=True)
)
output:
uniq_prod_cnt
0 2
1 1
2 1
3 3
4 2
5 1
6 2
CodePudding user response:
if you want to get array without index
use values
attribute :
df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M')])['Prod_id'].nunique().values
output:
array([2, 1, 1, 3, 2, 1, 2], dtype=int64)
if you want to get range index series
use reset_index(drop=True)
:
df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M')])['Prod_id'].nunique().reset_index(drop=True)
output:
0 2
1 1
2 1
3 3
4 2
5 1
6 2
Name: Prod_id, dtype: int64