I have a file with many columns to be analysed with Pandas. How can I delete columns if the percentage of missing values is higher than a certain percentage value?
CodePudding user response:
threshold = 0.4 # Your value here
cols_to_be_dropped = []
for column in df.columns:
if df[column].isna().sum() / len(df[column]) > threshold:
cols_to_be_dropped.append(column)
df.drop(cols_to_be_dropped, axis=1, inplace=True)