So I have found/figured out individual answers for counting the group by results, as well as returning the unique delimited list....but not for counting the unique values in group by.
I want to create a delimited field with only the unique values based on the first column, and then create a field that counts those unique values.
import pandas as pd
companies = ['Microsoft', 'Google', 'Amazon', 'Microsoft', 'Facebook', 'Google', 'Microsoft', 'Google']
products = ['OS', 'Search', 'E-comm', 'X-box', 'Social Media', 'Android', 'OS', 'Search']
df = pd.DataFrame({'company' : companies, 'product':products })
#this creates the non-unique delimited list and counts total occurances
df2 = df.groupby('company').product.agg([('count', 'count'), ('product', ', '.join)])
#this creates the unique delimited list, but doesn't count occurances
df3 = (df.groupby('company')['product']
.apply(lambda x: list(x.unique()))
.reset_index())
#This is my desired output
company product count
0 Amazon [E-comm] 1
1 Facebook [Social Media] 1
2 Google [Search, Android] 2
3 Microsoft [OS, X-box] 2
I haven't been able to successfully blend these two, but I know it should be relatively simple and straightforward?
CodePudding user response:
You can drop_duplicates
first and then agg
:
print (df.drop_duplicates().groupby("company", as_index=False)["product"].agg([list, "count"]))
list count
company
Amazon [E-comm] 1
Facebook [Social Media] 1
Google [Search, Android] 2
Microsoft [OS, X-box] 2
Or using unique
and nunique
:
print (df.groupby("company")["product"].agg(["unique", "nunique"]))
unique nunique
company
Amazon [E-comm] 1
Facebook [Social Media] 1
Google [Search, Android] 2
Microsoft [OS, X-box] 2
CodePudding user response:
You can use str.len
in your code:
df3 = (df.groupby('company')['product']
.apply(lambda x: list(x.unique()))
.reset_index()
.assign(count=lambda d: d['product'].str.len()) ## added line
)
output:
company product count
0 Amazon [E-comm] 1
1 Facebook [Social Media] 1
2 Google [Search, Android] 2
3 Microsoft [OS, X-box] 2