Pandas long format of success table-CodePudding

I have a table with the following structure in pandas:


import pandas as pd

df = pd.DataFrame({
    "user_id": [1, 2, 3],
    "user_events": [1, 2, 3],
    "user_successes": [0, 1, 2]
})

I would like to put it in a long format. In this case, we have, for each user, a different number of events, and successes. I would like to transform this into an event table (each row corresponds to an event, and there is a column that tells you whether it was a success or not). In this case, the output dataframe should be:

out_df = pd.DataFrame({
    "user_id": [1, 2, 2, 3, 3, 3],
    "success": [0, 1, 0, 1, 1, 0]
})

What's the simplest way of doing this in pandas? I would like to avoid for loops that iterate on each user, create dataframes and then append them.

CodePudding user response：

You may try with reindex with repeat then assign the value with limit condition created by groupby cumsum

s = df.reindex(df.index.repeat(df.user_events))
s['success'] = 1
s['success'] = s['success'].where(s.groupby('user_id')['success'].cumsum()<=s['user_successes'],0)
s
Out[54]: 
   user_id  user_events  user_successes  success
0        1            1               0        0
1        2            2               1        1
1        2            2               1        0
2        3            3               2        1
2        3            3               2        1
2        3            3               2        0

CodePudding user response：

One option with reindexing:

df2 = df.set_index('user_id')

out = (df
   [['user_id']]
   .loc[df.index.repeat(df2['user_events'])]
   .assign(success=lambda d: d.groupby('user_id').cumcount()
                              .lt(d['user_id'].map(df2['user_successes']))
                              .astype(int)
          )
)

output:

   user_id  success
0        1        0
1        2        1
1        2        0
2        3        1
2        3        1
2        3        0