I want to remove entire rows if all columns except the one is empty. So, imagine that my DataFrame is
df = pd.DataFrame({"col1": ["s1", "s2", "s3", "s4", "s5", "s6"],
"col2": [41, np.nan, np.nan, np.nan, np.nan, 61],
"col3": [24, 51, np.nan, np.nan, np.nan, 84],
"col4": [53, 64, 81, np.nan, np.nan, np.nan],
"col5": [43, 83, 47, 12, np.nan, 19]})
which looks like this
col1 col2 col3 col4 col5
0 s1 41 24 53 43
1 s2 NaN 51 64 83
2 s3 NaN NaN 81 47
3 s4 NaN NaN NaN 12
4 s5 NaN NaN NaN NaN
5 s6 61 84 NaN 19
In this example, the desired result is
col1 col2 col3 col4 col5
0 s1 41 24 53 43
1 s2 NaN 51 64 83
2 s3 NaN NaN 81 47
3 s4 NaN NaN NaN 12
4 s6 61 84 NaN 19
which means that I want to remove the last row. I initially tried with df.dropna(how="all")
but it does not work since the last row is not entirely empty (s5
in the col1
).
How can I solve this?
CodePudding user response:
Use the thresh
parameter:
N = 1
df.dropna(thresh=N 1)
Or if you want to match exactly N
NAs (no more no less):
N = 1
out = df[df.isna().sum(axis=1).ne(df.shape[1]-N)]
Output:
col1 col2 col3 col4 col5
0 s1 41.0 24.0 53.0 43.0
1 s2 NaN 51.0 64.0 83.0
2 s3 NaN NaN 81.0 47.0
3 s4 NaN NaN NaN 12.0
CodePudding user response:
df[df.iloc[:, 1:].notnull().any(axis=1)]
CodePudding user response:
You should use threshold in dropna.
df = df.dropna(axis=0, thresh=2)
CodePudding user response:
You can also try this one to check if the element is a NaN
.
np.isnan()
Here is the official documentation for more info. https://numpy.org/doc/stable/user/misc.html
CodePudding user response:
As an alternative to given answers and if you want to use .dropna()
you may set col1
as your index with:
df = df.set_index("col1")
This way df.dropna(how='all')
will work like a charm.
If you do not need it as index anymore you may get your column back by df['col1'] = df.index
and reset index df.reset_index(drop=True)
col1
will appear after col5
, you may rearrange it back with:
cols = df.columns.tolist()
cols = cols[-1:] cols[:-1]
df[cols]
CodePudding user response:
import numpy as np
import pandas as pd
df = pd.DataFrame({"x1": [np.nan, np.nan], "x2": [1, np.nan]})
print(df.head())
for idx, row in df.iterrows():
if np.isnan(row).all():
df = df.drop(idx)
print(df.head())
Added in edit:
Simply drop the irrelevant columns from the row count.
import numpy as np
import pandas as pd
df = pd.DataFrame({"name": ["keep", "remove"], "x1": [np.nan, np.nan], "x2": [1, np.nan]})
print("ORG")
print(df.head())
for idx, row in df.iterrows():
if np.isnan(row[1:].astype(float)).all():
df = df.drop(idx)
print("OUT")
print(df.head())