Show how much there is of a certain value in each column in pandas-CodePudding

My pandas data frame contains several columns, some of them have missing values which show up as a ? sign. I want to run a for loop to print how much ? there is in each columns of the data. I'm doing something like this:

colnames = ['col_1','col_2','col_3']

for i in colnames:
    print(f'In the {i} feature, the value - ? - occurs {data.i.value_counts()["?"]} times')

The error I get is :

AttributeError: 'DataFrame' object has no attribute 'i'

So I think that problem is with this part - data.i.value_counts(), I tried data[i].value_counts() but that didn't work eaither..

CodePudding user response：

For count values avoid value_counts, because failed selecting ? if value not exist in column. Simplier is compare values by ? and count Trues by sum:

for i in colnames:
    print(f'In the {i} feature, the value - ? - occurs {data[i].eq("?").sum()} times')

CodePudding user response：

Considering that the dataframe is data, if OP wants to use .value_counts(), adjust to the following

colnames = ['col1','col2','col3']

for i in colnames:
    print(f'In the {i} feature, the value - ? - occurs {data[i].value_counts()["?"]} times')

Or, if one want to know for all columns of the dataframe data, use

for i in data.columns:
    print(f'In the {i} feature, the value - ? - occurs {data[i].value_counts()["?"]} times')

If, on another hand one wants to prevent the KeyError (see first note), one can use .isin with .sum() as follows

for i in colnames:
    print(f'In the {i} feature, the value - ? - occurs {data[i].isin(["?"]).sum()} times')

Notes:

If a specific column doesn't have ?, one will get a KeyError: '?', so it might be more convenient to select the columns that have ?, instead of applying to all the dataframe columns.