I have a dataframe like so:
id familyHistoryDiabetes
0 YES - Father Diabetic
1 NO FAMILY HISTORY OF DIABETES
2 Yes-Mother & Father have Type 2 Diabetes
3 NO FAMILY HISTORY OF DIABETES
I would like to replace the column values with a simple 'yes' if the string contains 'yes' and 'no' if the string contains 'no'.
To do this I ran the following code:
df['familyHistoryDiabetes'] = df['familyHistoryDiabetes'].apply(lambda x: 'Yes' if 'Yes' in x else 'No')
After running this I realised that this would miss cases where 'yes' was all uppercase:
id familyHistoryDiabetes
0 No
1 No
2 Yes
3 No
So I want to run similar code but to ignore case of 'yes' when searching for it.
To do this I tried a solution like the one mentioned here using casefold() like so:
df['familyHistoryDiabetes'] = df['familyHistoryDiabetes'].apply(lambda x: 'Yes' if 'YES'.casefold() in map(str.casefold, x) else 'No')
But this did not work as it resulted in my dataframe becoming:
id familyHistoryDiabetes
0 No
1 No
2 No
3 No
I can imagine this is an easy fix but I am out of ideas!
Thanks.
CodePudding user response:
Try with np.where
with contains
with case = False
df['new'] = np.where(df['familyHistoryDiabetes'].str.contains('Yes',case = False),
'Yes',
'No')
CodePudding user response:
With str.extract
:
df["familyHistoryDiabetes" ] = df["familyHistoryDiabetes"].str.lower().str.extract("(yes|no)")
>>> df
id familyHistoryDiabetes
0 0 yes
1 1 no
2 2 yes
3 3 no
CodePudding user response:
You can use str.extract
with IGNORECASE
flag:
# regex.IGNORECASE = 2
df['new'] = df.familyHistoryDiabetes.str.extract('(Yes)', 2).fillna('No')
Output:
id familyHistoryDiabetes new
0 0 YES - Father Diabetic YES
1 1 NO FAMILY HISTORY OF DIABETES No
2 2 Yes-Mother & Father have Type 2 Diabetes Yes
3 3 NO FAMILY HISTORY OF DIABETES No