I am not sure why 'set' does not get unique values in the following example:
df6 = pd.DataFrame({
'Name': ['Sara', 'John'],
'one': ['UK', 'UK'],
'two': ['IN', 'SA'],
'three': ['IN', 'IN'],
'four': ['IN', 'US']
})
df6
gives:
Name one two three four
0 Sara UK IN IN IN
1 John UK SA IN US
I concatenated (one to four) columns in a list:
df6['Concat'] = df6[['one','two','three','four']].apply(lambda x: [', '.join(x[x.notnull()])], axis = 1)
gives:
Name one two three four Concat
0 Sara UK IN IN IN [UK, IN, IN, IN]
1 John UK SA IN US [UK, SA, IN, US]
Now I want to get the unique values only in the Concat
column for each name:
I tried the following:
df6.Concat.apply(set)
but the result is the same as the original list!
0 {UK, IN, IN, IN}
1 {UK, SA, IN, US}
Name: Concat, dtype: object
Why 'set' does not work in such case?
I do not want the unique list ordered, but just to enhance my learning, how can I get the unique values ordered?
CodePudding user response:
Your Concat
column consists of lists of strings. It is not a list. When you apply set()
to a string, you get a set of one string. You should apply set()
to the original data columns:
df6[['one','two','three','four']].apply(set, axis=1)
#0 {IN, UK}
#1 {SA, IN, UK, US}
The parameter axis=1
instructs apply()
to apply set()
row-wise.