Home > Net >  Adding elements from dataframe to a set object
Adding elements from dataframe to a set object

Time:07-24

I am trying to add my labels into a set object but when i try to do this i get a weird output. I want to have all the labels in the object with no repeating ones

types = set()
for t in frame4['practice']:
    types.update(t)
types
{'1',
 '3',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'I',
 'L',
 'M',
 'N',
 'O',
 'P',
 'S',
 'T',
 'W',
 'Z',
 '_',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'y'}

This is how the dataframe practice looks like. There is some repetitions since they are lables and all nan elements were removed

2        Identifier_Cookie_or_similar_Tech_1stParty
3                    Identifier_IP_Address_1stParty
4        Identifier_Cookie_or_similar_Tech_1stParty
8        Identifier_Cookie_or_similar_Tech_3rdParty
10                             Demographic_3rdParty
                            ...                    
21612                          Demographic_1stParty
21613                          Demographic_3rdParty
21614    Identifier_Cookie_or_similar_Tech_1stParty
21615    Identifier_Cookie_or_similar_Tech_3rdParty
21616    Identifier_Cookie_or_similar_Tech_1stParty
Name: practice, Length: 10201, dtype: object

CodePudding user response:

update() needs list of values

types.update( [t] )

When you send single string then it treats string as list of chars.


You could do it even without for-loop

types.update( frame4['practice'] )

or even directly

types = set( frame4['practice'] )

But you can do it even without set() but using .unique()

types = frame4['practice'].unique()

And if you want to remove duplicate values then use .drop_duplicates()

df = df['practice'].drop_duplicates(keep='last')

Minimal working example:

import pandas as pd

df = pd.DataFrame({
    'practice': ['abc', 'xyz', 'qrt', 'abc', '123', 'qrt']
})

print('--- 1 ---')
types = set( df['practice'] )
print(types)

print('--- 2 ---')
types = set()
types.update( df['practice'] )
print(types)

print('--- 3 ---')
types = df['practice'].unique()
print(types)

print('--- 4 ---')
df = df['practice'].drop_duplicates(keep='last')
print(df)

Result:

--- 1 ---
{'qrt', 'abc', 'xyz', '123'}
--- 2 ---
{'qrt', 'abc', 'xyz', '123'}
--- 3 ---
['abc' 'xyz' 'qrt' '123']
--- 4 ---
1    xyz
3    abc
4    123
5    qrt
Name: practice, dtype: object
  • Related