Dataframe conditional replacement with intigers-CodePudding

I have a dataframe column like this:

df['col_name'].unique()
>>>array([-1, 'Not Passed, On the boundary', 1, 'Passed, On the boundary',
       'Passed, Unclear result', 'Passes, Unclear result, On the boudnary',
       'Rejected, Unclear result'], dtype=object)

In this column, if an element contains the word 'Passed' as a field or as a substring, then replace the entire field with integer 1 else replace it with integer -1.

Kindly help me with this

CodePudding user response：

You can use .str.contains to check if value contains string and fill the NaN caused by integer value to False. Then use np.where to fill the True with 1 and False with 0. If you want to keep the original 1 and -1, you can try np.select.

m1 = df['col_name'].str.contains('Passed').fillna(False)
m2 = df['col_name'].isin([1, -1])

df['col_name_replace_1_-1'] = np.where(m1, 1, -1)
df['col_name_keep_1_-1'] = np.select([m2, m1, ~m1], [df['col_name'], 1, -1], default=df['col_name'])

print(df)

                                  col_name  col_name_replace_1_-1 col_name_keep_1_-1
0                                       -1                     -1                 -1
1              Not Passed, On the boundary                      1                  1
2                                        1                     -1                  1
3                  Passed, On the boundary                      1                  1
4                   Passed, Unclear result                      1                  1
5  Passes, Unclear result, On the boudnary                     -1                 -1
6                 Rejected, Unclear result                     -1                 -1

CodePudding user response：

df['col_name'] = df['col_name'].apply(lambda x: 1 if 'Positive' in x else -1)

This checks each entry in df['col_name'], checks if the string contains 'Positive' and replaces it with 1 or -1 appropriately. This obviously assumes that all entries in this column are str