Pandas - How to combine multiple columns into new one with list as value?-CodePudding

I have a dataframe that contains images:

SOME_COL SOME_COL IMAGE_MAIN IMAGE_2 IMAGE_3 IMAGE_4 IMAGE_5 IMAGE_6
   *        *          0       1       2       3       NaN     5

I want to drop the IMAGE_MAIN and IMAGE_[2..6] columns and create a new one IMAGES:

SOME_COL SOME_COL     IMAGES
   *        *       [0,1,2,3,5]

If any image is NaN I would like to skip that value instead of adding None or NaN to the list.

I tried this but it's obviously not a good way to do that:

    main_image = data_main['IMAGE_MAIN']
    image_2 = data_main['IMAGE_2']
    image_3 = data_main['IMAGE_3']
    image_4 = data_main['IMAGE_4']
    image_5 = data_main['IMAGE_5']
    image_6 = data_main['IMAGE_6']
    images = [x for x in [IMAGE_MAIN, IMAGE_2, IMAGE_3, IMAGE_4, IMAGE_5, IMAGE_6] if x]
    data_main['IMAGES'] = images

CodePudding user response：

You can start by filtering the columns which start with 'IMAGE' using DataFrame.filter, and then apply a function row-wise using DataFrame.apply which drops the NaN of each row and transforms it into a single list

df['IMAGES'] = (
    df.filter(like='IMAGE')
      .apply(lambda row: row.dropna().tolist(), axis=1)
)

Note that if a row contains NaNs the resulting list will contain floats, not integers. If you want to make sure that the values are integers use lambda row: row.dropna().astype(int).tolist().

CodePudding user response：

This should do the trick for you.

Filter NA and join into a list by row.

import pandas as pd

df = pd.DataFrame({
    "IMAGE_1": [1,2,None],
    "IMAGE_2": [4,None,6],
})

df["IMAGES"] = df["IMAGES"] = (
    df
    .filter(regex=r"IMAGE_\d")
    .apply(lambda r: r.dropna().to_list(), result_type="reduce", axis=1)
    )

#    IMAGE_1  IMAGE_2      IMAGES
# 0      1.0      4.0  [1.0, 4.0]
# 1      2.0      NaN       [2.0]
# 2      NaN      6.0       [6.0]