Pandas: Python check Multiple columns if contains value then return value to new column-CodePudding

Currently facing an issue (data cleaning) of looking at multiple columns in a df that contain the word "food" and then creating a new column named 'food' with the found value

I can search for each specific column however running into issues when searching through the entire dataset (looking at every column and not overriding the previous entry)

s = df['tag_1'].str.contains('food')
df['food'] = df.tag_1.where(s)

Current:

tag_1	tag_2	tag_3
placeholder	placeholder	French food
British food	placeholder	placeholder
placeholder	German food	placeholder

Ideal:

tag_1	tag_2	tag_3	food
placeholder	placeholder	French food	French food
British food	placeholder	placeholder	British food
placeholder	German food	placeholder	German food

CodePudding user response：

You can mask and bfill to get the first match:

mask = df.apply(lambda c: c.str.contains('food'))

df['food'] = df.where(mask).bfill(axis=1).iloc[:, 0]

output:

          tag_1        tag_2        tag_3          food
0   placeholder  placeholder  French food   French food
1  British food  placeholder  placeholder  British food
2   placeholder  German food  placeholder   German food

CodePudding user response：

Here are some additional ways by first stacking the data.

s = df.stack()
df.assign(food = s.loc[s.str.contains('food').loc[lambda x: x].index].to_numpy())

df.assign(food = df.stack().where(lambda x: x.str.contains('food')).groupby(level=0).first())

Output:

          tag_1        tag_2        tag_3          food
0   placeholder  placeholder  French food   French food
1  British food  placeholder  placeholder  British food
2   placeholder  German food  placeholder   German food