I have a two columns:
category names
vegetables [broccoli, ginger]
fruit [apple, grapes, dragonfruit]
vegetables [pine]
vegetables [bottleguord, pumpkin]
fruit [mango, guava]
I need to find unique values that each category contains. This is how you could create a new df
import numpy as np
import pandas as pd
df = pd.DataFrame({'category':['vegetables', 'fruit', 'vegetables', 'vegetables', 'fruit'],
'Names':['[broccoli, ginger]','[apple, grapes, dragonfruit]','[pine]','[bottleguord, pumpkin]', '[mango, guava]']})
This is how I was trying to do it.
g = df.groupby('category')['names'].apply(lambda x: list(np.unique(x)))
Expected output:
index = ['Vegetables' ,'Fruits']
new_df = pd.DataFrame(index=index)
This is how I tweaked the code: print(df.assign(Names=df['Names'].str[1:-1].str.split(', ')).explode('Names').groupby('Category')['Names'].apply(lambda x: len(set(x))))
catgeory len_unique_val
Vegetables 5
Fruits 4
CodePudding user response:
You can explode
, groupby
and apply
to transform as python set.
assuming lists as input:
(df.explode('names')
.groupby('category')
['names']
.apply(set)
)
output:
category
fruit {dragonfruit, guava, grapes, apple, mango}
vegetables {ginger, pine, broccoli, bottleguord, pumpkin}
assuming strings as input:
(df.assign(Names=df['Names'].str[1:-1].str.split(', '))
.explode('Names')
.groupby('category')
['Names']
.apply(lambda x: '[' ', '.join(set(x)) ']')
)
output:
category
fruit [dragonfruit, guava, grapes, apple, mango]
vegetables [ginger, pine, broccoli, bottleguord, pumpkin]