I have the following dataframe
id | status
____________
1 | reserved
2 | signed
1 | waiting
1 | signed
3 | waiting
2 | sold
3 | reserved
1 | sold
I want to chech a hypothesis that statuses reserved
, waiting
, signed
always lead to status sold
. I only need to check the following order, some statuses may be omitted like for id == 2
in dataframe.
I wonder if there's a way to look for next row values in grouped by id dataframe
Expected output is dataframe or list of id
s that follow the above rule. For the dataframe above it would be this:
id
__
1
2
My attemp was to get all unique id
with those statuses and then for each id
found list of it's statuses. Then I thought to filter it somehow but there are a lot of combinations.
df = df[df.status.isin(['reserved', 'waiting', 'signed', 'sold'])]
df1 = df.groupby('flat_id').['status'].unique()
df1.where('status'== [''reserved', 'waiting', 'signed', 'sold'']
or ['reserved', 'waiting', 'sold'] ... )
CodePudding user response:
IIUC, you just want to check that 'sold' is the last value per group:
m = df.groupby('id')['status'].apply(lambda s: s.iloc[-1] == 'sold')
out = m[m].index.tolist()
output: [1, 2]
If you want to ensure there is something before 'sold':
m = df.groupby('id')['status'].apply(lambda s: len(s)>1 and s.iloc[-1] == 'sold')
And if you want to ensure that this something is in a specific list:
m = df.groupby('id')['status'].apply(lambda s: s.isin(['reserved', 'waiting', 'signed']).any()
and s.iloc[-1] == 'sold')
m[m].index.tolist()
alternative:
(df.drop_duplicates('id', keep='last')
.loc[lambda d: d['status'].eq('sold'), 'id']
)
output:
5 2
7 1
Name: id, dtype: int64