Home > Net >  Pandas: Python check Multiple columns if contains value then return value to new column
Pandas: Python check Multiple columns if contains value then return value to new column

Time:07-28

Currently facing an issue (data cleaning) of looking at multiple columns in a df that contain the word "food" and then creating a new column named 'food' with the found value

I can search for each specific column however running into issues when searching through the entire dataset (looking at every column and not overriding the previous entry)

s = df['tag_1'].str.contains('food')
df['food'] = df.tag_1.where(s)

Current:

tag_1 tag_2 tag_3
placeholder placeholder French food
British food placeholder placeholder
placeholder German food placeholder

Ideal:

tag_1 tag_2 tag_3 food
placeholder placeholder French food French food
British food placeholder placeholder British food
placeholder German food placeholder German food

CodePudding user response:

You can mask and bfill to get the first match:

mask = df.apply(lambda c: c.str.contains('food'))

df['food'] = df.where(mask).bfill(axis=1).iloc[:, 0]

output:

          tag_1        tag_2        tag_3          food
0   placeholder  placeholder  French food   French food
1  British food  placeholder  placeholder  British food
2   placeholder  German food  placeholder   German food

CodePudding user response:

Here are some additional ways by first stacking the data.

s = df.stack()
df.assign(food = s.loc[s.str.contains('food').loc[lambda x: x].index].to_numpy())

or

df.assign(food = df.stack().where(lambda x: x.str.contains('food')).groupby(level=0).first())

Output:

          tag_1        tag_2        tag_3          food
0   placeholder  placeholder  French food   French food
1  British food  placeholder  placeholder  British food
2   placeholder  German food  placeholder   German food
  • Related