My pandas data frame contains several columns, some of them have missing values which show up as a ?
sign. I want to run a for loop to print how much ?
there is in each columns of the data. I'm doing something like this:
colnames = ['col_1','col_2','col_3']
for i in colnames:
print(f'In the {i} feature, the value - ? - occurs {data.i.value_counts()["?"]} times')
The error I get is :
AttributeError: 'DataFrame' object has no attribute 'i'
So I think that problem is with this part - data.i.value_counts()
, I tried data[i].value_counts()
but that didn't work eaither..
CodePudding user response:
For count values avoid value_counts
, because failed selecting ?
if value not exist in column. Simplier is compare values by ?
and count True
s by sum
:
for i in colnames:
print(f'In the {i} feature, the value - ? - occurs {data[i].eq("?").sum()} times')
CodePudding user response:
Considering that the dataframe is data
, if OP wants to use .value_counts()
, adjust to the following
colnames = ['col1','col2','col3']
for i in colnames:
print(f'In the {i} feature, the value - ? - occurs {data[i].value_counts()["?"]} times')
Or, if one want to know for all columns of the dataframe data
, use
for i in data.columns:
print(f'In the {i} feature, the value - ? - occurs {data[i].value_counts()["?"]} times')
If, on another hand one wants to prevent the KeyError (see first note), one can use .isin
with .sum()
as follows
for i in colnames:
print(f'In the {i} feature, the value - ? - occurs {data[i].isin(["?"]).sum()} times')
Notes:
- If a specific column doesn't have
?
, one will get aKeyError: '?'
, so it might be more convenient to select the columns that have?
, instead of applying to all the dataframe columns.