Home > Blockchain >  Identify values within threshold of others in group in pandas DataFrame
Identify values within threshold of others in group in pandas DataFrame

Time:04-09

So my question is how to get values of a column 'accuracy' are in -1 of each other with respect to 'vin' column. if we get -1 value than minimum 2 values of a particular 'vin' should be there and if it is less than 2 values then it will be false.

Below is my Dataframe:

import pandas as pd

df = pd.DataFrame({'vin':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb','ccc','ccc','ccc','ddd'],
                   'accuracy':[1,2,3,9,22,23,211,212,34,39,40,55]})
df

My expected output will be like column 'Result'.

df = pd.DataFrame({'vin':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb','ccc','ccc','ccc','ddd'],
                   'value':[1,2,3,9,22,23,211,212,34,39,40,55],'Result':['pass','pass','pass','fail','pass','pass','pass','pass','fail','pass','pass','fail']})
df

output:

    vin  value Result
0   aaa      1   pass
1   aaa      2   pass
2   aaa      3   pass
3   aaa      9   fail
4   bbb     22   pass
5   bbb     23   pass
6   bbb    211   pass
7   bbb    212   pass
8   ccc     34   fail
9   ccc     39   pass
10  ccc     40   pass
11  ddd     55   fail

CodePudding user response:

Assuming the data is sorted, you can compute a diff per group, check that the diff is ≤ 1, then use this mask and it's shift to feed to numpy.where:

# if not sorted
# df = df.sort_values(by=['vin', 'accuracy'])

mask = df.groupby('vin')['accuracy'].diff().le(1)
df['Result'] = np.where(mask|mask.groupby(df['vin']).shift(-1), 'pass', 'fail')

output:

    vin  accuracy Result
0   aaa         1   pass
1   aaa         2   pass
2   aaa         3   pass
3   aaa         9   fail
4   bbb        22   pass
5   bbb        23   pass
6   bbb       211   pass
7   bbb       212   pass
8   ccc        34   fail
9   ccc        39   pass
10  ccc        40   pass
11  ddd        55   fail
  • Related