I have a pandas df with a column that have a mix of values like so
| ID | home_page |
| ---------| ------------------------------------------------|
| 1 | facebook.com, facebook.com, meta.com |
| 2 | amazon.com |
| 3 | twitter.com, dev.twitter.com, twitter.com |
I want to create a new column that contain the unique values from home_page column. The final output should be
| ID | home_page | unique |
| -------- | -------------- |---------------------------|
| 1 | facebook.com, facebook.com, meta.com | facebook.com,meta.com |
| 2 | amazon.com | amazon.com |
| 3 | twitter.com, dev.twitter.com, twitter.com |twitter.com,dev.twitter.com|
I tried the following:
final["home_page"] = final["home_page"].str.split(',').apply(lambda x : ','.join(set(x)))
But when I do that I get
TypeError: float object is not iterable
The column has no NaN but just in case I tried
final["home_page"] = final["home_page"].str.split(',').apply(lambda x : ','.join(set(x)))
But the entire column return empty when doing that
CodePudding user response:
You are right that this is coming from np.nan
values which are of type float. The issue happens here: set(np.nan)
. The following should work for you (and should be faster).
df["home_page"].str.replace(' ', '').str.split(',').apply(np.unique)
If you actually want a string at the end you can throw the following at the end:
.apply(lambda x: ','.join(str(i) for i in x))