Home > Software design >  Split a comma delimited Pandas Column of Type Object
Split a comma delimited Pandas Column of Type Object

Time:05-15

I have a pandas df with a column that have a mix of values like so

| ID       | home_page                                       |
| ---------| ------------------------------------------------|
| 1        | facebook.com, facebook.com, meta.com            |
| 2        | amazon.com                                      |
| 3        | twitter.com, dev.twitter.com, twitter.com       |

I want to create a new column that contain the unique values from home_page column. The final output should be

| ID       | home_page                                       | unique                    |
| -------- | --------------                                  |---------------------------|
| 1        | facebook.com, facebook.com, meta.com            | facebook.com,meta.com     |
| 2        | amazon.com                                      | amazon.com                |
| 3        | twitter.com, dev.twitter.com, twitter.com       |twitter.com,dev.twitter.com|

I tried the following:

final["home_page"] = final["home_page"].str.split(',').apply(lambda x : ','.join(set(x)))

But when I do that I get

TypeError: float object is not iterable

The column has no NaN but just in case I tried

final["home_page"] = final["home_page"].str.split(',').apply(lambda x : ','.join(set(x)))

But the entire column return empty when doing that

CodePudding user response:

You are right that this is coming from np.nan values which are of type float. The issue happens here: set(np.nan). The following should work for you (and should be faster).

df["home_page"].str.replace(' ', '').str.split(',').apply(np.unique)

If you actually want a string at the end you can throw the following at the end:

.apply(lambda x: ','.join(str(i) for i in x))
  • Related