Home > Blockchain >  dropping rows that has only one non zero value from a pandas dataframe in python
dropping rows that has only one non zero value from a pandas dataframe in python

Time:04-15

I have a pandas dataframe as shown below:

Pandas Dataframe

I want to drop the rows that has only one non zero value. What's the most efficient way to do this?

CodePudding user response:

Try boolean indexing

# sample data
df = pd.DataFrame(np.zeros((10, 10)), columns=list('abcdefghij'))
df.iloc[2:5, 3] = 1
df.iloc[4:5, 4] = 1

# boolean indexing based on condition
df[df.ne(0).sum(axis=1).ne(1)]

Only rows 2 and 3 are removed because row 4 has two non-zero values and every other row has zero non-zero values. So we drop rows 2 and 3.

df.ne(0).sum(axis=1)

0    0
1    0
2    1
3    1
4    2
5    0
6    0
7    0
8    0
9    0

CodePudding user response:

Not sure if this is the most efficient but I'll try:

df[[col for col in df.columns if (df[col] != 0).sum() == 1]]

2 loops per column here: 1 for checking if != 0 and one more to sum the boolean values up (could break earlier if the second value is found).

Otherwise, you can define a custom function to check without looping twice per column:

def check(column):
    already_has_one = False
    for value in column:
        if value != 0:
            if already_has_one:
                return False
            already_has_one = True
    return already_has_one

then:

df[[col for col in df.columns if check(df[col])]]

Which is much faster than the first.

CodePudding user response:

Or like this:

df[(df.applymap(lambda x: bool(x)).sum(1) > 1).values]
  • Related