Below is my current df where it has a 1 if the id goes through a given stage and this is constructed by having a 1 if it is currently in a given stage or if it has gone through that stage in the past.
id | stage 1 | stage 2 | stage 3 | stage 4 | stage 5 | stage 6 | close lost | close won |
---|---|---|---|---|---|---|---|---|
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
This next table is what I am looking to generate. I want to only know if a given id has skipped a stage or not and I want it to be sequential like it is in the previous table.
id | stage 1 skip | stage 2 skip | stage 3 skip | stage 4 skip | stage 5 skip | stage 6 skip | is_closed |
---|---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
2 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
CodePudding user response:
Solution
- Set the index to
id
then change thedtype
of remaining columns to boolean - Invert the columns and calculate the cumulative maximum along the columns axis, this would serve as the indicator which shows the last available stage that an
id
has gone thorugh - Negate the logical mask from step 1 so that it will show us stages that are skipped by given
id
, then take the logical and with the mask from step 2. The resulting mask will give us the indicator of which stages are skipped by the givenid
m = df.set_index('id').astype(bool)
s = (m.loc[:, ::-1].cummax(1).loc[:, ::-1] & ~m)
s = s.drop(['close lost', 'close won'], axis=1).astype(int).add_suffix(' skip')
s['is_closed'] = (m['close lost'] | m['close won']).astype(int)
print(s)
stage 1 skip stage 2 skip stage 3 skip stage 4 skip stage 5 skip stage 6 skip is_closed
id
1 0 0 0 0 0 0 0
1 0 1 1 0 0 0 0
1 0 1 1 0 0 0 0
1 0 1 1 0 0 1 1
2 1 0 0 0 0 0 0
2 1 0 1 0 0 0 0
2 1 0 1 0 1 0 0
2 1 0 1 0 1 0 1