Home > Software engineering >  python pandas how to add unique identifier for similar group data
python pandas how to add unique identifier for similar group data

Time:06-02

Here is my dataframe

                product_title                               variation_list  
 Chauvet DJ GigBar Move Effect Light System                ['Black', 'White']
 Rane Twelve MKII DJ Controller                            ['New', 'Blemished']

my expected dataframe will be look like this

group_id                 product_title                            variation_list  unique_id  
FAT-1301    Chauvet DJ GigBar Move Effect Light System             Black           FAT-01
FAT-1301    Chauvet DJ GigBar Move Effect Light System             White           FAT-02 
FAT-1302       Rane Twelve MKII DJ Controller                      New             FAT-03
FAT-1302       Rane Twelve MKII DJ Controller                      Blemished       FAT-04

Basically I want to add extra two column group_id which will appoint global id for same group of data and unique_id column which will appoint unique value for every data.

CodePudding user response:

df2 = df.reset_index().explode('variation_list')
df2['group_id'] = 'FAT'   df2['index'].add(1).astype(str)
df2['unique_id'] = 'FAT'   (df2.reset_index(drop = True).index 1).astype(str)
df2

   index                                product_title  ... group_id unique_id
0      0   Chauvet DJ GigBar Move Effect Light System  ...     FAT1      FAT1
0      0   Chauvet DJ GigBar Move Effect Light System  ...     FAT1      FAT2
1      1   Chauvet DJ GigBar Move Effect Light System  ...     FAT2      FAT3
1      1   Chauvet DJ GigBar Move Effect Light System  ...     FAT2      FAT4

CodePudding user response:

using explode -

import pandas as pd

d = {'product_title':['Chauvet DJ GigBar Move Effect Light System',' Chauvet DJ GigBar Move Effect Light System'], 
     'variation_list' :[['Black', 'White'], ['New', 'Blemished']]}

df = pd.DataFrame(d)
df.insert(0, "group_id", df.index   1)
df = df.explode(['variation_list']).reset_index()
df.insert(4, "unique_id", df.index   1)
df.drop(columns=['index'], inplace=True)
df.group_id = df.group_id.apply(lambda x: 'FAT-'  str(x) )
df.unique_id = df.unique_id.apply(lambda x: 'FAT-'  str(x) )
print(df)

Output -

   group_id            product_title                variation_list unique_id
0   FAT-1   Chauvet DJ GigBar Move Effect Light System  Black       FAT-1
1   FAT-1   Chauvet DJ GigBar Move Effect Light System  White       FAT-2
2   FAT-2   Chauvet DJ GigBar Move Effect Light System  New         FAT-3
3   FAT-2   Chauvet DJ GigBar Move Effect Light System  Blemished   FAT-4
  • Related