I am trying to add my labels into a set object but when i try to do this i get a weird output. I want to have all the labels in the object with no repeating ones
types = set()
for t in frame4['practice']:
types.update(t)
types
{'1',
'3',
'A',
'B',
'C',
'D',
'E',
'F',
'G',
'I',
'L',
'M',
'N',
'O',
'P',
'S',
'T',
'W',
'Z',
'_',
'a',
'b',
'c',
'd',
'e',
'f',
'g',
'h',
'i',
'k',
'l',
'm',
'n',
'o',
'p',
'r',
's',
't',
'u',
'v',
'w',
'y'}
This is how the dataframe practice looks like. There is some repetitions since they are lables and all nan elements were removed
2 Identifier_Cookie_or_similar_Tech_1stParty
3 Identifier_IP_Address_1stParty
4 Identifier_Cookie_or_similar_Tech_1stParty
8 Identifier_Cookie_or_similar_Tech_3rdParty
10 Demographic_3rdParty
...
21612 Demographic_1stParty
21613 Demographic_3rdParty
21614 Identifier_Cookie_or_similar_Tech_1stParty
21615 Identifier_Cookie_or_similar_Tech_3rdParty
21616 Identifier_Cookie_or_similar_Tech_1stParty
Name: practice, Length: 10201, dtype: object
CodePudding user response:
update()
needs list of values
types.update( [t] )
When you send single string then it treats string as list of chars.
You could do it even without for
-loop
types.update( frame4['practice'] )
or even directly
types = set( frame4['practice'] )
But you can do it even without set()
but using .unique()
types = frame4['practice'].unique()
And if you want to remove duplicate values then use .drop_duplicates()
df = df['practice'].drop_duplicates(keep='last')
Minimal working example:
import pandas as pd
df = pd.DataFrame({
'practice': ['abc', 'xyz', 'qrt', 'abc', '123', 'qrt']
})
print('--- 1 ---')
types = set( df['practice'] )
print(types)
print('--- 2 ---')
types = set()
types.update( df['practice'] )
print(types)
print('--- 3 ---')
types = df['practice'].unique()
print(types)
print('--- 4 ---')
df = df['practice'].drop_duplicates(keep='last')
print(df)
Result:
--- 1 ---
{'qrt', 'abc', 'xyz', '123'}
--- 2 ---
{'qrt', 'abc', 'xyz', '123'}
--- 3 ---
['abc' 'xyz' 'qrt' '123']
--- 4 ---
1 xyz
3 abc
4 123
5 qrt
Name: practice, dtype: object