How to flatten column of lists when some contain Null values?-CodePudding

This is likely a very obvious question, but I have a column containing lists that I am trying to flatten: i.e [AB, CD] -> AB, CD.

Sample dataframe:

data = [
    ["ABC", ["AB", "AB", "EF"]],
    ["DEF", ["CD", "EF"]],
    ["GHI", ["JK"]],
    [
        "JKL",
        np.nan,
    ],
]

df = pd.DataFrame(data, columns=["ID", "list"])
df

I am applying this function to said column:

def func(string):
    flattened_string = ", ".join(map(str, string))
    return flattened_string

df = df["list_column"].apply(func)

However there are a few rows containing NaN values that initiate the error "TypeError: 'float' object is not iterable". Is there any way I can modify this function to fix this? Or perhaps (likely) a much better way to do this?

Thank you!!!

CodePudding user response：

You need to fill the NaNs in that column to something that's empty but iterable, like "" or []. You can also simplify your function to just .apply(", ".join), assuming the column of lists you're joining are only strings.

df["list"] = df["list"].fillna("").apply(", ".join)

CodePudding user response：

If needing to preserve the NaN values in the column, we can use setting with enlargement by only joining the notna values in the column and assigning back:

df['list'] = df.loc[df['list'].notnull(), 'list'].map(','.join)

Or if there are non-strings in the lists:

df['list'] = df.loc[
    df['list'].notnull(), 'list'
].map(lambda x: ','.join(map(str, x)))

df:

    ID      list
0  ABC  AB,AB,EF
1  DEF     CD,EF
2  GHI        JK
3  JKL       NaN  # Still NaN not empty string