Deleting a row which has matching attributes to a row with mean 0-CodePudding

basically trying to do some data cleaning in Jupyter Notebook and i am not the best with the syntax so have reached a roadblock. I have a column at the end with mean and if the mean is 0, i would like to delete that row as well as another row which is either above or below that has the same 'Customer' and 'Date' column values.

Example code i am trying to perform this on

So far i am trying something across these lines but with little success where both_index is my variable name for the DataFrame

for i in both_index['peak mean']: #this is part of code below   

    if i == 0:
        for j in both_index['peak mean']:
            if both_index['Customer'][i] == both_index['Customer'][j]: #this is wrong and i dont know the syntax
                both_index.drop(j)
        both_index.drop(i)

Any help would be appreciated thank you!

CodePudding user response：

I'm not too familiar with pandas or jupyter, but I found another question with an answer that looks promising. It mentioned a way to drop rows with a condition like this:

DF.drop(DF[DF.LABEL CONDITION].index, inplace=True)

In your case, it would look like this:

both_index.drop(both_index[df['peak mean'] == 0].index, inplace=True)

Backup your data if you can before trying this.

EDIT: Just noticed that this answer isn't complete. Won't delete the extra rows with the same 'Customer' and 'Date'

CodePudding user response：

both_index = both_index.set_index(['Customer', 'date'])
df1 = both_index[both_index['peak mean'] == 0]
both_index = both_index.loc[~both_index.index.isin(df1.index)]

CodePudding user response：

You could try this:

df = pd.DataFrame({'Customer': ['#1','#1','#1','#2','#2','#3','#3'], 'Date': [1,1,2,2,2,3,3], 'peak mean': [1,0,2,2,3,0,4]})'peak mean': [1,0,2,2,3,0,4]})
df = df.set_index(['Customer', 'Date'])    #do not miss this important step!
print(df)

               peak mean
Customer Date           
#1       1             1    #to delete
#1       1             0    #to delete
#1       2             2
#2       2             2
#2       2             3
#3       3             0    #to delete
#3       3             4    #to delete

In this example, we expect to keep only 1 row of customer #1, and also both rows of customer #2

df1 = df.drop(df.loc[df['peak mean']==0].index)
print(df1)

               peak mean
Customer Date           
#1       2             2
#2       2             2
#2       2             3