I have a data frame which is resultant of some manipulations of 2 other dfs
Note: PKEY_ID is the user defined index
Now I want another resultant data frame which would contain only the columns having any non-null values. Below is my code
diff_col_lst = [col for col in df if ~df[col].isna().all()]
err_df = df[diff_col_lst]
Output
Now the Data frame is actually empty since there are no columns at all, but on checking the df.shape
err_df.shape
Q-1 : Despite the dataframe is empty it says there are 10 rows. How to get the shape as (0,0) if the df is empty without doing explicit check of df.empty and manipulating shapes.
Q -2: Can we make the below code even more abstract?
diff_col_lst = [col for col in df if ~df[col].isna().all()]
err_df = df[diff_col_lst]
CodePudding user response:
First for Q2:
You could use
err_df = df.loc[:, df.any()]
If you look at documentation of any
, it says Return whether any element is True, potentially over an axis. The default axis is 0 or index. So it looks at each column from top to bottom along the index and see if there is any non-null value and if found, it returns True along that axis. We don't need to supply axis=0, because its the default value. It will return all such columns which have at least one non-null value.
Now you use .loc
to access the columns returned by df.any()
and the :
part says that I need all indices.
Q1. IIUC, you are filtering by columns which have a non-null value. Then if you need to check if any columns are there are not and you want to check only using shape then you can shape on columns like
err_df.columns.shape
which should give (0,)
in this case.
Or you could use size
which indicates number of elements. In this case it will return 0 for
err_df.size