Counting NaN values by row, in a List-CodePudding

Example data:

dictionary = {'col1':[[1,2,3],[1,2,3],[1,2,3]],'col2': [[1,'nan',3], 
[1,2,'nan'],[1,2,3]], 'col3': [[1,2,3],[1,2,3],[1,'nan',3]]}

df = pd.DataFrame(dictionary)

I have a DataFrame of lists and I want to count the number of NaN values by row:

col1     col2       col3   
[1,2,3]  [1,NaN,3]  [1,2,3]   
[1,2,3]  [1,2,NaN]  [1,2,3]   
[1,2,3]  [1,2,3]    [1,NaN,3]

I used

acceptable_data_df.iloc[:, 1:].apply(lambda x: list(itertools.chain(*x)), axis=1)

to convert them to one list and hopefully make it easier but I'm still stuck. (First column was text)

[1,2,3,1,NaN,3,1,2,3]   
[1,2,3,1,2,NaN,1,2,3]   
[1,2,3,1,2,3,1,NaN,3]

How can I do this?

CodePudding user response：

You could stack explode isna to get a Series where it's True for NaN and False otherwise. Then groupby sum fetches the number of NaN values per row:

df['number of NaN'] = df.stack().explode().isna().groupby(level=0).sum()

Output:

        col1         col2         col3  number of NaN
0  [1, 2, 3]  [1, nan, 3]    [1, 2, 3]              1
1  [1, 2, 3]  [1, 2, nan]    [1, 2, 3]              1
2  [1, 2, 3]    [1, 2, 3]  [1, nan, 3]              1

CodePudding user response：

IIUC use:

df['count'] = (df.col1   df.col2   df.col3).apply(lambda x: pd.isna(x).sum())

CodePudding user response：

A "boring" way to count "NaN" while keeping the values as a string is to just count how many times the "NaN" string appears:

(df["col1"]   df["col2"]   df["col3"]).str.count("NaN")