I am trying to generate a new column containing boolean values of whether a value of each row is Null or not. I wrote the following function,
def not_null(row):
null_list = []
for value in row:
null_list.append(pd.isna(value))
return null_list
df['not_null'] = df.apply(not_null, axis=1)
But I get the following warning message,
A value is trying to be set on a copy of a slice from a DataFrame.
Is there a better way to write this function?
Note: I want to be able to apply this function to each row regardless of knowing the header row name or not
Final output ->
Column1 | Column2 | Column3 | null_idx
NaN | Nan | Nan | [0, 1, 2]
1 | 23 | 34 | []
test1 | Nan | Nan | [1, 2]
CodePudding user response:
First your error means there is some filtering before in your code and need DataFrame.copy
:
df = df[df['col'].gt(100)].copy()
Then your solution should be improved:
df = pd.DataFrame({'a':[np.nan, 1, np.nan],
'b':[np.nan,4,6],
'c':[4,5,3]})
df['list_boolean_for_missing'] = [x[x].tolist() for x in df.isna().to_numpy()]
print (df)
a b c list_boolean_for_missing
0 NaN NaN 4 [True, True]
1 1.0 4.0 5 []
2 NaN 6.0 3 [True]
Your function:
dd = lambda x: [pd.isna(value) for value in x]
df['list_boolean_for_missing'] = df.apply(not_null, axis=1)
If need:
I am trying to generate a new column containing boolean values of whether a value of each row is Null or not
df['not_null'] = df.notna().all(axis=1)
print (df)
a b c not_null
0 NaN NaN 4 False
1 1.0 4.0 5 True
2 NaN 6.0 3 False
EDIT: For list of positions create helper array by np.arange
and filter it:
arr = np.arange(len(df.columns))
df['null_idx'] = [arr[x].tolist() for x in df.isna().to_numpy()]
print (df)
a b c null_idx
0 NaN NaN 4 [0, 1]
1 1.0 4.0 5 []
2 NaN 6.0 3 [0]