Home > database >  Check if specific values in column follow each other for each id
Check if specific values in column follow each other for each id

Time:08-24

I have the following dataframe

id | status
____________
1  | reserved
2  | signed
1  | waiting
1  | signed
3  | waiting
2  | sold
3  | reserved
1  | sold

I want to chech a hypothesis that statuses reserved, waiting, signed always lead to status sold. I only need to check the following order, some statuses may be omitted like for id == 2 in dataframe.

I wonder if there's a way to look for next row values in grouped by id dataframe

Expected output is dataframe or list of ids that follow the above rule. For the dataframe above it would be this:

id
__
1
2

My attemp was to get all unique id with those statuses and then for each id found list of it's statuses. Then I thought to filter it somehow but there are a lot of combinations.

df = df[df.status.isin(['reserved', 'waiting', 'signed', 'sold'])]

df1 = df.groupby('flat_id').['status'].unique()

df1.where('status'== [''reserved', 'waiting', 'signed', 'sold'']
or ['reserved', 'waiting', 'sold'] ... )

CodePudding user response:

IIUC, you just want to check that 'sold' is the last value per group:

m = df.groupby('id')['status'].apply(lambda s: s.iloc[-1] == 'sold')
out = m[m].index.tolist()

output: [1, 2]

If you want to ensure there is something before 'sold':

m = df.groupby('id')['status'].apply(lambda s: len(s)>1 and s.iloc[-1] == 'sold')

And if you want to ensure that this something is in a specific list:

m = df.groupby('id')['status'].apply(lambda s: s.isin(['reserved', 'waiting', 'signed']).any()
                                           and s.iloc[-1] == 'sold')
m[m].index.tolist()
alternative:
(df.drop_duplicates('id', keep='last')
   .loc[lambda d: d['status'].eq('sold'), 'id']
)

output:

5    2
7    1
Name: id, dtype: int64
  • Related