Home > Enterprise >  drop rows from pandas dataframe using for loop and if statements
drop rows from pandas dataframe using for loop and if statements

Time:12-15

I am trying to clean a dataset although run into an error where red is not recognised and I am not sure if I have written the function correctly. Ideally I want to drop rows based on the tolerances per colour and length. I am trying to create a function for this. I want to be able to pass a colour, upper tolerance and lower tolerance and remove the row from the dataset.

Thanks!

import pandas as pd

df = pd.DataFrame(
    {
        "Colour": [
            "Red",
            "Red",
            "Red",
            "Red",
            "Red",
            "Blue",
            "Blue",
            "Blue",
            "Green",
            "Green",
            "Green",
        ],
        "Length": [14, 15, 16, 20, 15, 15, 18, 17, 15, 19, 18],
    }
)


def tolerance_drop(Colour, Upper, Lower):
    for i in range(0, len(df)):
        if (df.loc[i, "Colour"] == Colour) & (df.loc[i, "Length"] > Upper):
            df.drop([i])
        elif (df.loc[i, "Colour"] == Colour) & (df.loc[i, "Length"] < Lower):
            df.drop([i])
        else:
            break
        
# should remove 2 red rows giving 9 remaining rows
tolerance_drop("Red", 19.150, 14.5)

print(df)


Output:

    it simply prints the dataframe the same as before. No rows are deleted.

CodePudding user response:

Avoid using an explicit looping if you able to apply pandas vectorized operations.
Simple filtering:

In [466]: df = df[~((df.Colour == 'Red') & ((df.Length > 19.150) | (df.Length < 14.5)))]

In [467]: df
Out[467]: 
   Colour  Length
1     Red      15
2     Red      16
4     Red      15
5    Blue      15
6    Blue      18
7    Blue      17
8   Green      15
9   Green      19
10  Green      18

CodePudding user response:

As pointed out in the comments, there are better ways for doing this.

But if you are learning and want to know why your function doesn't work, you should try this:

def tolerance_drop(Colour, Upper, Lower):
    for i in range(0, len(df)):
        if df.loc[i, "Colour"] == Colour and (df.loc[i, "Length"] > Upper or df.loc[i, "Length"] < Lower):
            df.drop([i], inplace=True)

tolerance_drop("Red", 19.150, 14.5)

print(df)

In your version, the break statement will exit the for-loop as soon as that line of code is reached, so you don't want that.

In python & is a bitwise operator that has a different meaning. To combine conditions, you can use and/or.

When you drop a row, the resulting dataframe won't be magically saved into the same variable, unless you use the inplace=True argument.

Output:

   Colour  Length
1     Red      15
2     Red      16
4     Red      15
5    Blue      15
6    Blue      18
7    Blue      17
8   Green      15
9   Green      19
10  Green      18
  • Related