find in datfarame outliers and fill with nan python-CodePudding

I am trying to make a function to spot the columns with "100" in the header and replace all values in these columns that are above 100 with nan values :

import pandas as pd
    
data = {'first_100':  ['25', '1568200', '5'],
            'second_column': ['first_value', 'second_value', 'third_value'],
             'third_100':['89', '9', '589'],
            'fourth_column':['first_value', 'second_value', 'third_value'],
            }
    
df = pd.DataFrame(data)

print (df)

so this is the output I am looking for

CodePudding user response：

Use filter to identify the columns with '100', to_numeric to ensure having numeric values, then mask with a boolean array:

cols = df.filter(like='100').columns

df[cols] = df[cols].mask(df[cols].apply(pd.to_numeric, errors='coerce').gt(100))

output:

  first_100 second_column third_100 fourth_column
0        25   first_value        89   first_value
1       NaN  second_value         9  second_value
2         5   third_value       NaN   third_value