This is likely a very obvious question, but I have a column containing lists that I am trying to flatten: i.e [AB, CD] -> AB, CD
.
Sample dataframe:
data = [
["ABC", ["AB", "AB", "EF"]],
["DEF", ["CD", "EF"]],
["GHI", ["JK"]],
[
"JKL",
np.nan,
],
]
df = pd.DataFrame(data, columns=["ID", "list"])
df
I am applying this function to said column:
def func(string):
flattened_string = ", ".join(map(str, string))
return flattened_string
df = df["list_column"].apply(func)
However there are a few rows containing NaN
values that initiate the error "TypeError: 'float' object is not iterable"
. Is there any way I can modify this function to fix this? Or perhaps (likely) a much better way to do this?
Thank you!!!
CodePudding user response:
You need to fill the NaN
s in that column to something that's empty but iterable, like ""
or []
. You can also simplify your function to just .apply(", ".join)
, assuming the column of lists you're joining are only strings.
df["list"] = df["list"].fillna("").apply(", ".join)
CodePudding user response:
If needing to preserve the NaN
values in the column, we can use setting with enlargement by only joining the notna
values in the column and assigning back:
df['list'] = df.loc[df['list'].notnull(), 'list'].map(','.join)
Or if there are non-strings in the lists:
df['list'] = df.loc[
df['list'].notnull(), 'list'
].map(lambda x: ','.join(map(str, x)))
df
:
ID list
0 ABC AB,AB,EF
1 DEF CD,EF
2 GHI JK
3 JKL NaN # Still NaN not empty string