Let's say I have the following pd.DataFrame
>>> df = pd.DataFrame({
'col_1': ['Elon', 'Jeff', 'Warren', 'Mark'],
'col_2': ['nan', 'Bezos', 'Buffet', 'nan'],
'col_3': ['nan', 'Amazon', 'Berkshire', 'Meta'],
})
which gets me
col_1 col_2 col_3
0 Elon nan nan
1 Jeff Bezos Amazon
2 Warren Buffet Berkshire
3 Mark nan Meta
All column types are strings. I would like a way to obtain the number of rows per column where the cell value is 'nan'
.
Where I simply run the following I get always zeros as missing count since it doesnt check for string which contain nan.
>>> df.isna().sum()
col_1 0
col_2 0
col_3 0
dtype: int64
However, what I want is to get
col_1 0
col_2 2
col_3 1
How can I do that?
CodePudding user response:
you have nan
as string , you can do :
df.eq("nan").sum()
output :
col_1 0
col_2 2
col_3 1
dtype: int64
CodePudding user response:
It took me a while to see that you changed your initial code for the dataset. However, if you would like to extract all of the rows where you have the 'nan'
string, I would use a mask.
mask = np.column_stack([df[col].str.contains("nan", na = False) for col in df])
df_new = df.loc[mask.any(axis = 1)]
This creates a new data frame that you can experiment with.