Home > OS >  Python: How to use value_counts() inside .agg function in pandas?
Python: How to use value_counts() inside .agg function in pandas?

Time:10-13

Input dataframe df looks like:

item    row
Apple   12
Apple   12
Apple   13
Orange  13
Orange  14
Lemon   14

Output dataframe need to be

item    unique_row      nunique_row     count
Apple   {12,13}             2            {2,1}
Orange  {13,14}             2            {1,1}
Lemon   {14}                1            {1}

Tried Code:

df.groupby('item', as_index=False)['row'].agg({'unique_row': lambda x: set(x)
                                                ,'nunique_row': lambda x: len(set(x))})

So here, not sure how to add condition inside .agg function to generate column 'count'. Column 'count' represents number of value_count for each row value. Any help will be appreciated. Thank You!

CodePudding user response:

Solution

s = df.value_counts()
g = s.reset_index(name='count').groupby('item')

g.agg(list).join(g.size().rename('nunique_row'))

Working

  • Calculate the groupsize per item and row using value_counts
  • group the preceding counts by item
    • agg with list to get the list of unique rows and corresponding counts
    • agg with size to get number of unique rows

Result

             row   count  nunique_row
item                                 
Apple   [12, 13]  [2, 1]            2
Lemon       [14]     [1]            1
Orange  [13, 14]  [1, 1]            2

CodePudding user response:

You need to convert to list or set:

(df.groupby('item', as_index=False)['row']
   .agg({'unique_row': lambda x: list(x.unique()),
         'nunique_row': lambda x: len(set(x)),
         'count': lambda x: list(x.value_counts(sort=False)), # or set(x.value_counts())
        })
)

output:

     item unique_row  nunique_row   count
0   Apple   [12, 13]            2  [2, 1]
1   Lemon       [14]            1     [1]
2  Orange   [13, 14]            2  [1, 1]
  • Related