I am a very beginner of python and pandas. My dataset has ? , which is not NaN or null. I want to count how many ? are on certain columns.
I tried to use count value_counts() or other count functions but it did not work. I want to count how many ? there are in workclass column. Thanks
I would like to know the way without using scikitlearn or other ML library.
CodePudding user response:
You can use isin
function for this.
Here is an example.
arr = {'col1': ['?',2,'?',4], 'col2': [6,7,8,9]}
df = pd.DataFrame(arr)
df.isin(['?']).sum()
output:
col1 2
col2 0
CodePudding user response:
Your first task should be to replace all columns with ?
to NaN
or None
so that you can use built-in Pandas functions to easily count them.
import pandas as pd
import numpy as np
data = {'number': [1, 2, 3, 4, '?', 5],
'string': ['a', 'b', 'c', 'd', '?', 'e']
}
df = pd.DataFrame(data)
df['number'] = df['number'].replace('?', np.NaN)
df['string'] = df['string'].replace('?', None)
Now you can count the number of missing values.
df.isna().sum()
Output:
number 1
string 1
dtype: int64