Home > Mobile >  Convert stage dummy variables into a df that shows skipped stages using pandas
Convert stage dummy variables into a df that shows skipped stages using pandas

Time:09-22

Below is my current df where it has a 1 if the id goes through a given stage and this is constructed by having a 1 if it is currently in a given stage or if it has gone through that stage in the past.

id stage 1 stage 2 stage 3 stage 4 stage 5 stage 6 close lost close won
1 1 0 0 0 0 0 0 0
1 1 0 0 1 0 0 0 0
1 1 0 0 1 1 0 0 0
1 1 0 0 1 1 0 1 0
2 0 1 0 0 0 0 0 0
2 0 1 0 1 0 0 0 0
2 0 1 0 1 0 1 0 0
2 0 1 0 1 0 1 0 1

This next table is what I am looking to generate. I want to only know if a given id has skipped a stage or not and I want it to be sequential like it is in the previous table.

id stage 1 skip stage 2 skip stage 3 skip stage 4 skip stage 5 skip stage 6 skip is_closed
1 0 0 0 0 0 0 0
1 0 1 1 0 0 0 0
1 0 1 1 0 0 0 0
1 0 1 1 0 0 1 1
2 1 0 0 0 0 0 0
2 1 0 1 0 0 0 0
2 1 0 1 0 1 0 0
2 1 0 1 0 1 0 1

CodePudding user response:

Solution

  • Set the index to id then change the dtype of remaining columns to boolean
  • Invert the columns and calculate the cumulative maximum along the columns axis, this would serve as the indicator which shows the last available stage that an id has gone thorugh
  • Negate the logical mask from step 1 so that it will show us stages that are skipped by given id, then take the logical and with the mask from step 2. The resulting mask will give us the indicator of which stages are skipped by the given id
m = df.set_index('id').astype(bool)
s = (m.loc[:, ::-1].cummax(1).loc[:, ::-1] & ~m)

s = s.drop(['close lost', 'close won'], axis=1).astype(int).add_suffix(' skip')
s['is_closed'] = (m['close lost'] | m['close won']).astype(int)

print(s)

    stage 1 skip  stage 2 skip  stage 3 skip  stage 4 skip  stage 5 skip  stage 6 skip  is_closed
id                                                                                               
1              0             0             0             0             0             0          0
1              0             1             1             0             0             0          0
1              0             1             1             0             0             0          0
1              0             1             1             0             0             1          1
2              1             0             0             0             0             0          0
2              1             0             1             0             0             0          0
2              1             0             1             0             1             0          0
2              1             0             1             0             1             0          1
  • Related