Pandas: How do I create a new column given the column values exist in a list of lists?-CodePudding

What's the neatest way to create a new column based on the values from another column being contained in a list of lists with some extra conditions as well?

So the dataframe and the nested list are:

df = pd.DataFrame({ "col": ["A", "B", "D", "G", "C", nan, "H"]})

categ = [["A", "D"], ["Missing", "C"], ["Other"]]

In my case I would also like np.nan to be considered as "Missing" and if the column value is not present in the lists then it should be considered as "Other".

So the resulting df should like this:

   col        NewCol
0    A        [A, D]
1    B       [Other]
2    D        [A, D]
3    G       [Other]
4    C  [Missing, C]
5  NaN  [Missing, C]
6    H       [Other]

CodePudding user response：

You can use a simple for loop to check if the column row is in the list.

from fuzzywuzzy import process

result = []

for row in df.values: 
    for i in range(len(categ)):
        if row in categ[i]:
            result.append(categ[i])
            break
        elif i == len(categ) - 1:
            result.append(process.extractOne('Other',categ)[0])
            break
        elif pd.isna(row):
            result.append(process.extractOne('Missing',categ)[0])
            break

df['NewCol'] = result

   col        NewCol
0    A        [A, D]
1    B       [Other]
2    D        [A, D]
3    G       [Other]
4    C  [Missing, C]
5  NaN  [Missing, C]
6    H       [Other]