I am trying to make a function to spot the columns with "100" in the header and replace all values in these columns that are above 100 with nan values :
import pandas as pd
data = {'first_100': ['25', '1568200', '5'],
'second_column': ['first_value', 'second_value', 'third_value'],
'third_100':['89', '9', '589'],
'fourth_column':['first_value', 'second_value', 'third_value'],
}
df = pd.DataFrame(data)
print (df)
so this is the output I am looking for
CodePudding user response:
Use filter
to identify the columns with '100'
, to_numeric
to ensure having numeric values, then mask
with a boolean array:
cols = df.filter(like='100').columns
df[cols] = df[cols].mask(df[cols].apply(pd.to_numeric, errors='coerce').gt(100))
output:
first_100 second_column third_100 fourth_column
0 25 first_value 89 first_value
1 NaN second_value 9 second_value
2 5 third_value NaN third_value