Split a comma delimited Pandas Column of Type Object-CodePudding

I have a pandas df with a column that have a mix of values like so

| ID       | home_page                                       |
| ---------| ------------------------------------------------|
| 1        | facebook.com, facebook.com, meta.com            |
| 2        | amazon.com                                      |
| 3        | twitter.com, dev.twitter.com, twitter.com       |

I want to create a new column that contain the unique values from home_page column. The final output should be

| ID       | home_page                                       | unique                    |
| -------- | --------------                                  |---------------------------|
| 1        | facebook.com, facebook.com, meta.com            | facebook.com,meta.com     |
| 2        | amazon.com                                      | amazon.com                |
| 3        | twitter.com, dev.twitter.com, twitter.com       |twitter.com,dev.twitter.com|

I tried the following:

final["home_page"] = final["home_page"].str.split(',').apply(lambda x : ','.join(set(x)))

But when I do that I get

TypeError: float object is not iterable

The column has no NaN but just in case I tried

final["home_page"] = final["home_page"].str.split(',').apply(lambda x : ','.join(set(x)))

But the entire column return empty when doing that

CodePudding user response：

You are right that this is coming from np.nan values which are of type float. The issue happens here: set(np.nan). The following should work for you (and should be faster).

df["home_page"].str.replace(' ', '').str.split(',').apply(np.unique)

If you actually want a string at the end you can throw the following at the end:

.apply(lambda x: ','.join(str(i) for i in x))