I created the following dataframe, and would like to identify the cells which are Null,
import pandas as pd
import numpy as np
data = [{'a': 1, 'b': 2, 'c':3},
{'a':10, 'b': np.NaN, 'c':"" },
{'a':10, 'b':"" , 'c':np.NaN }]
df = pd.DataFrame(data)
a b c
0 1 2 3
1 10 NaN
2 10 NaN
I used the following code, x1 = np.where(pd.isnull(df))
and get the result like
print(x1)
(array([1, 2], dtype=int64), array([1, 2], dtype=int64))
However, I want to generate the cell location explicitly for each entry associated with NaN. I use the zip function, but get the following error message
print(set(zip(x1)))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [41], in <cell line: 1>()
----> 1 print(set(zip(x1)))
TypeError: unhashable type: 'numpy.ndarray'
What are the right ways to generate the location information explicitly based on x1?
CodePudding user response:
Try with stack
then get the NaN
index
df.stack(dropna=False).loc[lambda x : x!=x].index.tolist()
Out[115]: [(1, 'b'), (2, 'c')]
CodePudding user response:
You could use numpy.where
:
import numpy as np
null_indices, col_idx = np.where(df.isna())
null_columns = df.columns[col_idx]
Output:
(array([1, 2], dtype=int64), Index(['b', 'c'], dtype='object'))
If you want to see it as tuples, you can zip
:
out = list(zip(null_indices, null_columns))
Output:
[(1, 'b'), (2, 'c')]
For your specific code, since x1
is a tuple of arrays, you need to unpack them inside zip
, like:
out = list(zip(*x1))