pandas groupby and return a series of one column-CodePudding

I have a dataframe like below

df = pd.DataFrame({'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2],
                   'time_1' :['2017-04-03 12:35:00','2017-04-03 12:50:00','2018-04-05 12:59:00','2018-05-04 13:14:00','2017-05-05 13:37:00','2018-07-06 13:39:00','2018-07-08 11:30:00','2017-04-08 16:00:00','2019-04-09 22:00:00','2019-04-11 04:00:00','2018-04-13 04:30:00','2017-04-14 08:00:00'],
                   'val' :[5,5,5,5,1,6,5,5,8,3,4,6],
                   'Prod_id':['A','B','C','A','E','Q','G','F','G','H','J','A']})
df['time_1'] = pd.to_datetime(df['time_1'])

I would like to do the below

a) groupby subject_id and time_1 using freq=3M`

b) return only the aggregated values of Prod_id column (and drop index)

So, I tried the below

df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M')])['Prod_id'].nunique()

Though the above works but it returned the group by columns as well in the output.

So, I tried the below using as_index=False

df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M'),as_index=False])['Prod_id'].nunique()

But still it didn't give the exepected output

I expect my output to be like as shown below

uniq_prod_cnt

    2
    1
    1
    3
    2
    1
    2

CodePudding user response：

You are in one of those cases in which you need to get rid of the index afterwards.

To get the exact shown output:

(df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M')])
   .agg(uniq_prod_cnt=('Prod_id', 'nunique'))
   .reset_index(drop=True)
)

output:

   uniq_prod_cnt
0              2
1              1
2              1
3              3
4              2
5              1
6              2

CodePudding user response：

if you want to get array without index use values attribute :

df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M')])['Prod_id'].nunique().values

output:

array([2, 1, 1, 3, 2, 1, 2], dtype=int64)

if you want to get range index series use reset_index(drop=True):

df.groupby(['subject_id',pd.Grouper(key='time_1', freq='3M')])['Prod_id'].nunique().reset_index(drop=True)

output:

0    2
1    1
2    1
3    3
4    2
5    1
6    2
Name: Prod_id, dtype: int64