I have a DataFrame like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [[np.nan, np.nan, 0], [np.nan, 1, 1], [1, np.nan, 2], [np.nan, np.nan, 3]]})
print(df)
A
0 [nan, nan, 0]
1 [nan, 1, 1]
2 [1, nan, 2]
3 [nan, nan, 3]
Now I want to remove the rows, where the first two elements of the list are nans so to get this:
A
1 [nan, 1, 1]
2 [1, nan, 2]
I tried:
df.drop(df[np.isnan(df.A[0]) & np.isnan(df.A[1])].index)
But it doesn't work of course. So how to achieve it, while keeping the values a list and not making seperate columns out of the list?
CodePudding user response:
Fix your code by adding str
df = df.drop(df[np.isnan(df.A.str[0]) & np.isnan(df.A.str[1])].index)
Out[20]:
A
1 [nan, 1, 1]
2 [1, nan, 2]
CodePudding user response:
I would convert the column to intermediate dataframe and then check the counts of values along axis=1
to identify the rows where the first two values are non-null
m = pd.DataFrame(df['A'].tolist()).iloc[:, :2].count(1) != 0
df[m]
A
1 [nan, 1, 1]
2 [1, nan, 2]
CodePudding user response:
Try this:
m = df['A'].apply(lambda x: x[:2] == [np.nan, np.nan])
df[~m]
Output:
A
1 [nan, 1, 1]
2 [1, nan, 2]