Home > OS >  Python pandas how to update a column with value 1 if another column contains a certain word
Python pandas how to update a column with value 1 if another column contains a certain word

Time:10-19

df looks like this:

description and keybenefits (14) brand_cooltouch (1711) brand_easylogic (1712)
Lorem Ipsum cooltouch Lorem Ipsum
Lorem Ipsum easylogic Lorem Ipsum
Lorem Ipsum Lorem Ipsum

What I want: When Column description and keybenefits (14) contains the value 'cooltouch' columm brand_cooltouch (1711) needs to be set to value 1 (int). When Column description and keybenefits (14) contains the value 'easylogic' columm brand_easylogic (1712) needs to be set to value 1 (int).

Output that I want:

description and keybenefits (14) brand_cooltouch (1711) brand_easylogic (1712)
Lorem Ipsum cooltouch Lorem Ipsum 1
Lorem Ipsum Lorem Ipsum easylogic 1
Lorem Ipsum Lorem Ipsum

Any help is very much appreciated.

CodePudding user response:

One can use pandas.Series.str.contains.

For the string cooltouch do the following

df['brand_cooltouch (1711)'] = df['description and keybenefits (14)'].str.contains('cooltouch', case=False).astype(int)

[Out]:

    description and keybenefits (14)  brand_cooltouch (1711)  brand_easylogic (1712)
0  Lorem Ipsum cooltouch Lorem Ipsum                       1                     None
1  Lorem Ipsum easylogic Lorem Ipsum                       0                     None
2            Lorem Ipsum Lorem Ipsum                       0                     None

For the string easylogic, do the following

df['brand_easylogic (1712)'] = df['description and keybenefits (14)'].str.contains('easylogic', case=False).astype(int)

[Out]:

    description and keybenefits (14)  brand_cooltouch (1711)  brand_easylogic (1712)
0  Lorem Ipsum cooltouch Lorem Ipsum                       1                     0
1  Lorem Ipsum easylogic Lorem Ipsum                       0                     1
2            Lorem Ipsum Lorem Ipsum                       0                     0

Notes:

  • case=False is to make it case insensitive.

CodePudding user response:

you can use np.where. I'd suggest to fill all cells where the condition is not met with NaN or 0. Here is a solution using np.nan

df["brand_cooltouch (1711)“] = np.where(df["description and keybenefits (14)“].str.contains("cooltouch"), 1, np.nan)
df["brand_easylogic (1712)“] = np.where(df["description and keybenefits (14)“].str.contains("easylogic"), 1, np.nan)

CodePudding user response:

Use Series.str.contains -

df['brand_cooltouch (1711)'] = df['description and keybenefits (14)'].str.contains("cooltouch").astype(int)

Output

    description and keybenefits (14)  brand_cooltouch (1711)  brand_easylogic (1712)
0  Lorem Ipsum cooltouch Lorem Ipsum                       1                     NaN
1  Lorem Ipsum easylogic Lorem Ipsum                       0                     NaN
2            Lorem Ipsum Lorem Ipsum                       0                     NaN

If you do not wish the resulting column to be 1's and 0's - you could also do something like -

df.loc[df['description and keybenefits (14)'].str.contains("cooltouch"), ['brand_cooltouch (1711)']] = '1'
df.loc[~df['description and keybenefits (14)'].str.contains("cooltouch"), ['brand_cooltouch (1711)']] = ''

Output

    description and keybenefits (14) brand_cooltouch (1711)  brand_easylogic (1712)
0  Lorem Ipsum cooltouch Lorem Ipsum                      1                     NaN
1  Lorem Ipsum easylogic Lorem Ipsum                                            NaN
2            Lorem Ipsum Lorem Ipsum                                            NaN
  • Related