Home > Software engineering >  how to detect an error in a logical sequence in the column of a dataframe?
how to detect an error in a logical sequence in the column of a dataframe?

Time:09-28

I have a dataframe:

import pandas as pd
inp = [{'id':1, 'c2':100}, {'id':2,'c2':110}, {'id':4,'c2':120}]
df = pd.DataFrame(inp)

the id column must be a logical sequence 1,2,3,4..etc.

the id column can contain an infinite number of values.

what is the simplest method to detect the bad int at the level of the 3rd row (id = 4). Normally id = 3

Thank you

CodePudding user response:

If possible check differency between values of id and test if greater like 1 use:

df = df[df['id'].diff().gt(1)]
print (df)
   id   c2
2   4  120

If there is default RangeIndex and id starting by 1:

df = df[df['id'].sub(df.index) != 1]

CodePudding user response:

Try this:

>>> (df.index   1) == df['id']
array([ True,  True, False])

>>> df[~((df.index   1) == df['id'])]

    id  c2
2   4   120
  • Related