print(dfs["Categorias"])
I m getting this
0 wordpress, criação de sites
1 criação de sites
2 e-commerce, criação de sites, wordpress
3 marketing digital, vendas
How can i remove repeated items and join the unique values in list?
Thank you
CodePudding user response:
You could use sets and itertools.chain
:
from itertools import chain
set(chain(*df['Categorias'].str.split(',\s ')))
Output:
{'criação de sites', 'e-commerce', 'marketing digital', 'vendas', 'wordpress'}
Optionally, as list:
>>> list(set(chain(*df['Categorias'].str.split(',\s '))))
['criação de sites', 'e-commerce', 'marketing digital', 'vendas', 'wordpress']
CodePudding user response:
Are you looking for something like that:
Split each row into a list and explode this list into rows then get unique values of the column.
>>> df['Categorias'].str.split(r',\s ').explode().unique().tolist()
['wordpress', 'criação de sites', 'e-commerce', 'marketing digital', 'vendas']
Step by step:
>>> df = df['Categorias'].str.split(r',\s ')
0
0 [wordpress, criação de sites]
1 [criação de sites]
2 [e-commerce, criação de sites, wordpress]
3 [marketing digital, vendas]
Name: Categorias, dtype: object
>>> df = df.explode()
0
0 wordpress
0 criação de sites
1 criação de sites
2 e-commerce
2 criação de sites
2 wordpress
3 marketing digital
3 vendas
Name: Categorias, dtype: object
>>> df.unique().tolist()
['wordpress', 'criação de sites', 'e-commerce', 'marketing digital', 'vendas']
CodePudding user response:
One way is to convert the dataframe column to a list, remove duplicates using a set and then join them using string operations.
>>> ', '.join(set(df['Categorias'].str.split(', ').explode().tolist()))