I have a dataframe with a product_type column that has duplicate substrings within strings:
df1
product_type
bag,bag
tote bag,bag
handbag,handbag
I'm using this line to remove to create a new column "unique_type" the duplicate substrings
df_1['unique_type'] = [set(sub.split(',')) for sub in df_1["product_type"]]
This is what the new dataframe looks like
current output
product_type unique_type
bag,bag {'bag'}
tote bag, bag {'tote bag', 'bag'}
{''}
handbag, handbag {'handbag'}
The problem is that the strings in the new column unique_type has curly brackets and quotation marks. I would like to produce a column that has strings without curly brackets and quotation marks like so:
desired output
product_type unique_type
bag,bag bag
tote bag, bag tote bag, bag
handbag, handbag handbag
CodePudding user response:
Add join
:
df_1['unique_type'] = [', '.join(set(sub.split(','))) for sub in df_1["product_type"]]
Or if need same order of values use dict.fromkeys
trick:
df_1['unique_type1'] = [', '.join(dict.fromkeys(sub.split(',')))
for sub in df_1["product_type"]]
print (df_1)
product_type unique_type unique_type1
0 bag,bag bag bag
1 tote bag,bag bag, tote bag tote bag, bag
2
3 handbag,handbag handbag handbag