Home > database >  Python dataframe check if column value is greater than previous column
Python dataframe check if column value is greater than previous column

Time:01-13

On python im trying to check if from a day to the next one (column by column), by ID, the values, if not all equal to zero, are correctly incremented by one or if at some point the value goes back to 0, then the next day it is either still equal to zero or incremented by one.

I have a dataframe with multiple columns, the first column is named "ID". All other columns are integers which names represent a day followed by the next day like this :
see input table here

I need to check if for each ID :

  • all the columns (i.e for all of the days) are equal to 0 then create new column named "CHECK" which equals to 0 meaning there is no error ;

  • if not then look column after column and check if the value in the next column is greater than the previous column value (i.e the day before) and just incremented by 1 (e.g from 14 to 15 and not 14 to 16) then "CHECK" equals to 0 ;

  • if these conditions aren't satisfied it means that the next column is either equal to the previous column or lower (but not equal to zero) and in both cases it is an error then "CHECK" equals to 1 ;

    • But if the next value column is lower and equal to 0 then the next one to it must be either still 0 or incremented by 1. Each time it comes back to zero it must be followed by zero or incremented by 1

Each time there is a column in which the calculation comes back to zero, the next columns should either be equal to zero or getting 1 value.

If i explained everything correctly then in this example, the first two IDs are correct and their variable "CHECK" must be equal to 0 but the next ID should have a "CHECK" value 1.

I hope this is not confusing..thanks.

I tried this but would like to not use the column name but its index/position. Code not finished.`

df['check'] = np.where((df['20110531']<=df['20110530']) & ([df.columns!="ID"] != 0),1,0)

CodePudding user response:

You could write a simple function to go along each row and check for your condition. In the example code below, I first set the index to ID.

df = df.set_index('ID')

def func(r):
    start = r[0]
    for j,i in enumerate(r):
        if j == 0:
            continue
        if i == 0 or i == start  1:
            start = i
        else:
            return 1
    return 0
    
df['check'] = df.apply(func, axis = 1)

if you want to keep the original index the don't reset and use df['check'] = df.iloc[:,1:].apply(func, axis = 1)

  • Related