Home > Enterprise >  Replace dataframe columns values with string if value contains case insensitive string
Replace dataframe columns values with string if value contains case insensitive string

Time:12-02

I have a dataframe like so:

id      familyHistoryDiabetes
0       YES - Father Diabetic
1       NO FAMILY HISTORY OF DIABETES
2       Yes-Mother & Father have Type 2 Diabetes
3       NO FAMILY HISTORY OF DIABETES

I would like to replace the column values with a simple 'yes' if the string contains 'yes' and 'no' if the string contains 'no'.

To do this I ran the following code:

df['familyHistoryDiabetes'] = df['familyHistoryDiabetes'].apply(lambda x: 'Yes' if 'Yes' in x else 'No')

After running this I realised that this would miss cases where 'yes' was all uppercase:

id      familyHistoryDiabetes
0       No
1       No
2       Yes
3       No

So I want to run similar code but to ignore case of 'yes' when searching for it.

To do this I tried a solution like the one mentioned here using casefold() like so:

df['familyHistoryDiabetes'] = df['familyHistoryDiabetes'].apply(lambda x: 'Yes' if 'YES'.casefold() in map(str.casefold, x) else 'No')

But this did not work as it resulted in my dataframe becoming:

id      familyHistoryDiabetes
0       No
1       No
2       No
3       No

I can imagine this is an easy fix but I am out of ideas!

Thanks.

CodePudding user response:

Try with np.where with contains with case = False

df['new'] = np.where(df['familyHistoryDiabetes'].str.contains('Yes',case = False),
                     'Yes', 
                     'No')

CodePudding user response:

With str.extract:

df["familyHistoryDiabetes" ] = df["familyHistoryDiabetes"].str.lower().str.extract("(yes|no)")

>>> df
   id familyHistoryDiabetes
0   0                   yes
1   1                    no
2   2                   yes
3   3                    no

CodePudding user response:

You can use str.extract with IGNORECASE flag:

# regex.IGNORECASE = 2
df['new'] = df.familyHistoryDiabetes.str.extract('(Yes)', 2).fillna('No')

Output:

   id                     familyHistoryDiabetes  new
0   0                     YES - Father Diabetic  YES
1   1             NO FAMILY HISTORY OF DIABETES   No
2   2  Yes-Mother & Father have Type 2 Diabetes  Yes
3   3             NO FAMILY HISTORY OF DIABETES   No
  • Related