Home > database >  Capture all unique information by group
Capture all unique information by group

Time:11-23

I want to create a unique dataset of fruits. I don't know all the types (e.g. colour store, price) that could be under each fruit. For each type, there could also be duplicate rows. Is there a way to detect all possible duplicates and capture all unique informoation in a fully generalisable way?

   type    val       detail
0 fruit    apple
1 colour   green     greenish
2 colour   yellow    
3 store    walmart    usa
4 price    10
5 NaN
6 fruit    banana
7 colour   yellow
8 fruit    pear
9 fruit    jackfruit
...

Expected Output

   fruit      colour            store    price       detail           ...
0  apple     [green, yellow ]  [walmart]  [10]      [greenish, usa] 
1  banana     [yellow]           NaN      NaN
2  pear        NaN               NaN      NaN    
3  jackfruit   NaN               NaN      NaN    

I tried. But this does not get close to the expected output. It does not show the colum names either.

df.groupby("type")["val"].agg(size=len, set=lambda x: set(x))
0 fruit   {"apple",...}
1 colour  ...

CodePudding user response:

First is created fruit column with val values if type is fruit, replace non matched values to NaNs and forward filling missing values, then pivoting by DataFrame.pivot_table with custom function for unique values without NaNs and then flatten MultiIndex:

m = df['type'].eq('fruit')

df['fruit'] = df['val'].where(m).ffill()

df1 = (df.pivot_table(index='fruit',columns='type', 
                      aggfunc=lambda x: list(dict.fromkeys(x.dropna())))
        .drop('fruit', axis=1, level=1))
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')
print (df1)
          detail_colour detail_price detail_store       val_colour val_price  \
fruit                                                                          
apple        [greenish]           []        [usa]  [green, yellow]      [10]   
banana               []          NaN          NaN         [yellow]       NaN   
jackfruit           NaN          NaN          NaN              NaN       NaN   
pear                NaN          NaN          NaN              NaN       NaN   

           val_store  
fruit                 
apple      [walmart]  
banana           NaN  
jackfruit        NaN  
pear             NaN  
  • Related