What's the neatest way to create a new column based on the values from another column being contained in a list of lists with some extra conditions as well?
So the dataframe and the nested list are:
df = pd.DataFrame({ "col": ["A", "B", "D", "G", "C", nan, "H"]})
categ = [["A", "D"], ["Missing", "C"], ["Other"]]
In my case I would also like np.nan to be considered as "Missing" and if the column value is not present in the lists then it should be considered as "Other".
So the resulting df should like this:
col NewCol
0 A [A, D]
1 B [Other]
2 D [A, D]
3 G [Other]
4 C [Missing, C]
5 NaN [Missing, C]
6 H [Other]
CodePudding user response:
You can use a simple for loop to check if the column row is in the list.
from fuzzywuzzy import process
result = []
for row in df.values:
for i in range(len(categ)):
if row in categ[i]:
result.append(categ[i])
break
elif i == len(categ) - 1:
result.append(process.extractOne('Other',categ)[0])
break
elif pd.isna(row):
result.append(process.extractOne('Missing',categ)[0])
break
df['NewCol'] = result
col NewCol
0 A [A, D]
1 B [Other]
2 D [A, D]
3 G [Other]
4 C [Missing, C]
5 NaN [Missing, C]
6 H [Other]