'set' does not work to get unique values in a column of lists in pandas-CodePudding

I am not sure why 'set' does not get unique values in the following example:

df6 = pd.DataFrame({
                  'Name': ['Sara', 'John'],
                   'one': ['UK', 'UK'],
                   'two': ['IN', 'SA'],
                    'three': ['IN', 'IN'],
                     'four': ['IN', 'US']
                   })

df6

gives:

    Name    one     two    three    four
0   Sara    UK      IN     IN       IN
1   John    UK      SA     IN       US

I concatenated (one to four) columns in a list:

df6['Concat'] = df6[['one','two','three','four']].apply(lambda x: [', '.join(x[x.notnull()])], axis = 1)

gives:

    Name    one two three   four    Concat
0   Sara    UK  IN  IN  IN  [UK, IN, IN, IN]
1   John    UK  SA  IN  US  [UK, SA, IN, US]

Now I want to get the unique values only in the Concat column for each name:

I tried the following:

df6.Concat.apply(set)

but the result is the same as the original list!

0    {UK, IN, IN, IN}
1    {UK, SA, IN, US}
Name: Concat, dtype: object

Why 'set' does not work in such case?

I do not want the unique list ordered, but just to enhance my learning, how can I get the unique values ordered?

CodePudding user response：

Your Concat column consists of lists of strings. It is not a list. When you apply set() to a string, you get a set of one string. You should apply set() to the original data columns:

df6[['one','two','three','four']].apply(set, axis=1)
#0            {IN, UK}
#1    {SA, IN, UK, US}

The parameter axis=1 instructs apply() to apply set() row-wise.