Home > Software design >  Python Pandas delete row based on specific condition where list indexing is required
Python Pandas delete row based on specific condition where list indexing is required

Time:10-18

I have a DataFrame like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [[np.nan, np.nan, 0], [np.nan, 1, 1], [1, np.nan, 2], [np.nan, np.nan, 3]]})
print(df)

           A
0  [nan, nan, 0]
1    [nan, 1, 1]
2    [1, nan, 2]
3  [nan, nan, 3]

Now I want to remove the rows, where the first two elements of the list are nans so to get this:

           A
1    [nan, 1, 1]
2    [1, nan, 2]

I tried:

df.drop(df[np.isnan(df.A[0]) & np.isnan(df.A[1])].index)

But it doesn't work of course. So how to achieve it, while keeping the values a list and not making seperate columns out of the list?

CodePudding user response:

Fix your code by adding str

df = df.drop(df[np.isnan(df.A.str[0]) & np.isnan(df.A.str[1])].index)
Out[20]: 
             A
1  [nan, 1, 1]
2  [1, nan, 2]

CodePudding user response:

I would convert the column to intermediate dataframe and then check the counts of values along axis=1 to identify the rows where the first two values are non-null

m = pd.DataFrame(df['A'].tolist()).iloc[:, :2].count(1) != 0
df[m]

             A
1  [nan, 1, 1]
2  [1, nan, 2]

CodePudding user response:

Try this:

m = df['A'].apply(lambda x: x[:2] == [np.nan, np.nan])
df[~m]

Output:

              A
1   [nan, 1, 1]
2   [1, nan, 2]
  • Related