Value count of columns in a pandas DataFrame where where string is 'nan'-CodePudding

Let's say I have the following pd.DataFrame

>>> df = pd.DataFrame({
    'col_1': ['Elon', 'Jeff', 'Warren', 'Mark'],
    'col_2': ['nan', 'Bezos', 'Buffet', 'nan'],
    'col_3': ['nan', 'Amazon', 'Berkshire', 'Meta'],
})

which gets me

    col_1   col_2   col_3
0   Elon    nan     nan
1   Jeff    Bezos   Amazon
2   Warren  Buffet  Berkshire
3   Mark    nan     Meta

All column types are strings. I would like a way to obtain the number of rows per column where the cell value is 'nan'.

Where I simply run the following I get always zeros as missing count since it doesnt check for string which contain nan.

>>> df.isna().sum()

col_1    0
col_2    0
col_3    0
dtype: int64

However, what I want is to get

col_1    0
col_2    2
col_3    1

How can I do that?

CodePudding user response：

you have nan as string , you can do :

df.eq("nan").sum()

output :

col_1    0
col_2    2
col_3    1
dtype: int64

CodePudding user response：

It took me a while to see that you changed your initial code for the dataset. However, if you would like to extract all of the rows where you have the 'nan' string, I would use a mask.

mask = np.column_stack([df[col].str.contains("nan", na = False) for col in df])
df_new = df.loc[mask.any(axis = 1)]

This creates a new data frame that you can experiment with.