Example data:
dictionary = {'col1':[[1,2,3],[1,2,3],[1,2,3]],'col2': [[1,'nan',3],
[1,2,'nan'],[1,2,3]], 'col3': [[1,2,3],[1,2,3],[1,'nan',3]]}
df = pd.DataFrame(dictionary)
I have a DataFrame of lists and I want to count the number of NaN values by row:
col1 col2 col3
[1,2,3] [1,NaN,3] [1,2,3]
[1,2,3] [1,2,NaN] [1,2,3]
[1,2,3] [1,2,3] [1,NaN,3]
I used
acceptable_data_df.iloc[:, 1:].apply(lambda x: list(itertools.chain(*x)), axis=1)
to convert them to one list and hopefully make it easier but I'm still stuck. (First column was text)
[1,2,3,1,NaN,3,1,2,3]
[1,2,3,1,2,NaN,1,2,3]
[1,2,3,1,2,3,1,NaN,3]
How can I do this?
CodePudding user response:
You could stack
explode
isna
to get a Series where it's True for NaN and False otherwise. Then groupby
sum
fetches the number of NaN values per row:
df['number of NaN'] = df.stack().explode().isna().groupby(level=0).sum()
Output:
col1 col2 col3 number of NaN
0 [1, 2, 3] [1, nan, 3] [1, 2, 3] 1
1 [1, 2, 3] [1, 2, nan] [1, 2, 3] 1
2 [1, 2, 3] [1, 2, 3] [1, nan, 3] 1
CodePudding user response:
IIUC use:
df['count'] = (df.col1 df.col2 df.col3).apply(lambda x: pd.isna(x).sum())
CodePudding user response:
A "boring" way to count "NaN" while keeping the values as a string is to just count how many times the "NaN"
string appears:
(df["col1"] df["col2"] df["col3"]).str.count("NaN")