the ways of using zip function to generate the cell location based on the row and column information-CodePudding

I created the following dataframe, and would like to identify the cells which are Null,

import pandas as pd
import numpy as np
data = [{'a': 1, 'b': 2, 'c':3},
        {'a':10, 'b': np.NaN, 'c':"" },
         {'a':10, 'b':"" , 'c':np.NaN }]
df = pd.DataFrame(data)

     a    b     c
0    1    2     3
1   10   NaN    
2   10         NaN

I used the following code, x1 = np.where(pd.isnull(df)) and get the result like

print(x1)
(array([1, 2], dtype=int64), array([1, 2], dtype=int64))

However, I want to generate the cell location explicitly for each entry associated with NaN. I use the zip function, but get the following error message

print(set(zip(x1)))



 ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [41], in <cell line: 1>()
----> 1 print(set(zip(x1)))

TypeError: unhashable type: 'numpy.ndarray'

What are the right ways to generate the location information explicitly based on x1?

CodePudding user response：

Try with stack then get the NaN index

df.stack(dropna=False).loc[lambda x : x!=x].index.tolist()
Out[115]: [(1, 'b'), (2, 'c')]

CodePudding user response：

You could use numpy.where:

import numpy as np
null_indices, col_idx = np.where(df.isna())
null_columns = df.columns[col_idx]

Output:

(array([1, 2], dtype=int64), Index(['b', 'c'], dtype='object'))

If you want to see it as tuples, you can zip:

out = list(zip(null_indices, null_columns))

Output:

[(1, 'b'), (2, 'c')]

For your specific code, since x1 is a tuple of arrays, you need to unpack them inside zip, like:

out = list(zip(*x1))