Input dataframe df
looks like:
item row
Apple 12
Apple 12
Apple 13
Orange 13
Orange 14
Lemon 14
Output dataframe need to be
item unique_row nunique_row count
Apple {12,13} 2 {2,1}
Orange {13,14} 2 {1,1}
Lemon {14} 1 {1}
Tried Code:
df.groupby('item', as_index=False)['row'].agg({'unique_row': lambda x: set(x)
,'nunique_row': lambda x: len(set(x))})
So here, not sure how to add condition inside .agg
function to generate column 'count
'. Column 'count
' represents number of value_count
for each row value.
Any help will be appreciated. Thank You!
CodePudding user response:
Solution
s = df.value_counts()
g = s.reset_index(name='count').groupby('item')
g.agg(list).join(g.size().rename('nunique_row'))
Working
- Calculate the groupsize per
item
androw
usingvalue_counts
- group the preceding counts by
item
agg
withlist
to get the list of unique rows and corresponding countsagg
withsize
to get number of unique rows
Result
row count nunique_row
item
Apple [12, 13] [2, 1] 2
Lemon [14] [1] 1
Orange [13, 14] [1, 1] 2
CodePudding user response:
You need to convert to list or set:
(df.groupby('item', as_index=False)['row']
.agg({'unique_row': lambda x: list(x.unique()),
'nunique_row': lambda x: len(set(x)),
'count': lambda x: list(x.value_counts(sort=False)), # or set(x.value_counts())
})
)
output:
item unique_row nunique_row count
0 Apple [12, 13] 2 [2, 1]
1 Lemon [14] 1 [1]
2 Orange [13, 14] 2 [1, 1]