Home > Mobile >  How to flatten column of lists when some contain Null values?
How to flatten column of lists when some contain Null values?

Time:10-15

This is likely a very obvious question, but I have a column containing lists that I am trying to flatten: i.e [AB, CD] -> AB, CD.

Sample dataframe:

data = [
    ["ABC", ["AB", "AB", "EF"]],
    ["DEF", ["CD", "EF"]],
    ["GHI", ["JK"]],
    [
        "JKL",
        np.nan,
    ],
]

df = pd.DataFrame(data, columns=["ID", "list"])
df

I am applying this function to said column:

def func(string):
    flattened_string = ", ".join(map(str, string))
    return flattened_string
df = df["list_column"].apply(func)

However there are a few rows containing NaN values that initiate the error "TypeError: 'float' object is not iterable". Is there any way I can modify this function to fix this? Or perhaps (likely) a much better way to do this?

Thank you!!!

CodePudding user response:

You need to fill the NaNs in that column to something that's empty but iterable, like "" or []. You can also simplify your function to just .apply(", ".join), assuming the column of lists you're joining are only strings.

df["list"] = df["list"].fillna("").apply(", ".join)

CodePudding user response:

If needing to preserve the NaN values in the column, we can use setting with enlargement by only joining the notna values in the column and assigning back:

df['list'] = df.loc[df['list'].notnull(), 'list'].map(','.join)

Or if there are non-strings in the lists:

df['list'] = df.loc[
    df['list'].notnull(), 'list'
].map(lambda x: ','.join(map(str, x)))

df:

    ID      list
0  ABC  AB,AB,EF
1  DEF     CD,EF
2  GHI        JK
3  JKL       NaN  # Still NaN not empty string
  • Related