Home > Enterprise >  the ways of using zip function to generate the cell location based on the row and column information
the ways of using zip function to generate the cell location based on the row and column information

Time:03-26

I created the following dataframe, and would like to identify the cells which are Null,

import pandas as pd
import numpy as np
data = [{'a': 1, 'b': 2, 'c':3},
        {'a':10, 'b': np.NaN, 'c':"" },
         {'a':10, 'b':"" , 'c':np.NaN }]
df = pd.DataFrame(data)

     a    b     c
0    1    2     3
1   10   NaN    
2   10         NaN

I used the following code, x1 = np.where(pd.isnull(df)) and get the result like

print(x1)
(array([1, 2], dtype=int64), array([1, 2], dtype=int64))

However, I want to generate the cell location explicitly for each entry associated with NaN. I use the zip function, but get the following error message

print(set(zip(x1)))



 ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [41], in <cell line: 1>()
----> 1 print(set(zip(x1)))

TypeError: unhashable type: 'numpy.ndarray'

What are the right ways to generate the location information explicitly based on x1?

CodePudding user response:

Try with stack then get the NaN index

df.stack(dropna=False).loc[lambda x : x!=x].index.tolist()
Out[115]: [(1, 'b'), (2, 'c')]

CodePudding user response:

You could use numpy.where:

import numpy as np
null_indices, col_idx = np.where(df.isna())
null_columns = df.columns[col_idx]

Output:

(array([1, 2], dtype=int64), Index(['b', 'c'], dtype='object'))

If you want to see it as tuples, you can zip:

out = list(zip(null_indices, null_columns))

Output:

[(1, 'b'), (2, 'c')]

For your specific code, since x1 is a tuple of arrays, you need to unpack them inside zip, like:

out = list(zip(*x1))
  • Related