How to count the unique values per date using python-CodePudding

I am practicing data analytics and I am stuck in one problem.

I group the dataframe by Date Purchased and set it to unique because I want to count the unique value for each date purchased.

training.groupby('DATE PURCHASED')['Account - Store Name'].unique().to_frame()

So it looks like this GROUPBY DATE PURCHASED

Now that the data has been aggregated, I want to count the items in that column, so I used.split(',').

training_groupby['Account - Store Name'].apply(lambda x: x.split(','))

but I got error

AttributeError: 'numpy.ndarray' object has no attribute 'split'

Can someone help me, on how to count the number of unique values per Date Purchased. I've been trying to solve this for almost a week now. I tried to search on Youtube and Google it. But I can't find anything that will help me.

CodePudding user response：

I think this is what you want?

training_groupby["Total Purchased"] = training_groupby["Account - Store Name"].apply(lambda x: len(set(x)))

CodePudding user response：

You can do multiple aggregations in the same pandas.DataFrame.groupby clause :

Try this :

out = (training
         .groupby(['DATE PURCHASED'])
         .agg(**{
                 'Account - Store Name': ('Account - Store Name', 'unique'),
                 'Items Count': ('Account - Store Name', 'nunique'),
                })
        )

# Output :

print(out)

                                      Account - Store Name  Items Count
DATE PURCHASED                                                         
13/01/2022              [Landmark Makati, Landmark Nuvali]            2
14/01/2022                               [Landmark Nuvali]            1
15/01/2022            [Robinsons Dolores, Landmark Nuvali]            2
16/01/2022      [Robinsons Ilocos Norte, Landmarj Trinoma]            2
19/01/2022                              [Shopwise Alabang]            1