I have a dataframe column like this:
df['col_name'].unique()
>>>array([-1, 'Not Passed, On the boundary', 1, 'Passed, On the boundary',
'Passed, Unclear result', 'Passes, Unclear result, On the boudnary',
'Rejected, Unclear result'], dtype=object)
In this column, if an element contains the word 'Passed' as a field or as a substring, then replace the entire field with integer 1 else replace it with integer -1.
Kindly help me with this
CodePudding user response:
You can use .str.contains
to check if value contains string and fill the NaN caused by integer value to False
. Then use np.where
to fill the True
with 1 and False
with 0. If you want to keep the original 1 and -1, you can try np.select
.
m1 = df['col_name'].str.contains('Passed').fillna(False)
m2 = df['col_name'].isin([1, -1])
df['col_name_replace_1_-1'] = np.where(m1, 1, -1)
df['col_name_keep_1_-1'] = np.select([m2, m1, ~m1], [df['col_name'], 1, -1], default=df['col_name'])
print(df)
col_name col_name_replace_1_-1 col_name_keep_1_-1
0 -1 -1 -1
1 Not Passed, On the boundary 1 1
2 1 -1 1
3 Passed, On the boundary 1 1
4 Passed, Unclear result 1 1
5 Passes, Unclear result, On the boudnary -1 -1
6 Rejected, Unclear result -1 -1
CodePudding user response:
df['col_name'] = df['col_name'].apply(lambda x: 1 if 'Positive' in x else -1)
This checks each entry in df['col_name']
, checks if the string contains 'Positive'
and replaces it with 1 or -1 appropriately. This obviously assumes that all entries in this column are str