I have function in Python Pandas like below:
def my_func(df, col: str):
if pd.isna(df[col]):
return False
To use my function I need: df_resul = my_func(df = my_df, col = "col1")
And Data Frame like below where col1 is string data type:
col1
--------
NaN
ABC
NaN
How can I modify my function, so as to as a result have 2 different DataFrames:
- Where in col1 is NaN
- Where in col1 is value other than NaN
So to use my function I need: df_nan, df_not_nan = my_func(df = my_df, col = "col1")
where df_nan will return df where in col1 is nan and df_not_nan will return df where in col is value other than nan.
df_nan:
col1
------
NaN
NaN
df_not_nan:
col1
-----
ABC
How can I modify my function in Python Pandas ?
CodePudding user response:
Use boolean indexing
with ~
fo rinvert mask, here for select non missing values rows:
print (my_df)
col1 a
0 NaN 1
1 ABC 2
2 NaN 3
def my_func(df, col: str):
m = df[col].isna()
return df[m], df[~m]
df_nan, df_not_nan = my_func(df = my_df, col = "col1")
print (df_nan)
col1 a
0 NaN 1
2 NaN 3
print (df_not_nan)
col1 a
1 ABC 2
If need test if exist at least one missing value is necesary add Series.any
for avoid error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
def my_func1(df, col: str):
if pd.isna(df[col]).any():
return 'exist at least one missing values'
else:
return 'no missing values'
out = my_func1(df = my_df, col = "col1")
print (out)
exist at least one missing values