Home > Net >  Pandas long format of success table
Pandas long format of success table

Time:09-08

I have a table with the following structure in pandas:


import pandas as pd

df = pd.DataFrame({
    "user_id": [1, 2, 3],
    "user_events": [1, 2, 3],
    "user_successes": [0, 1, 2]
})

I would like to put it in a long format. In this case, we have, for each user, a different number of events, and successes. I would like to transform this into an event table (each row corresponds to an event, and there is a column that tells you whether it was a success or not). In this case, the output dataframe should be:

out_df = pd.DataFrame({
    "user_id": [1, 2, 2, 3, 3, 3],
    "success": [0, 1, 0, 1, 1, 0]
})

What's the simplest way of doing this in pandas? I would like to avoid for loops that iterate on each user, create dataframes and then append them.

CodePudding user response:

You may try with reindex with repeat then assign the value with limit condition created by groupby cumsum

s = df.reindex(df.index.repeat(df.user_events))
s['success'] = 1
s['success'] = s['success'].where(s.groupby('user_id')['success'].cumsum()<=s['user_successes'],0)
s
Out[54]: 
   user_id  user_events  user_successes  success
0        1            1               0        0
1        2            2               1        1
1        2            2               1        0
2        3            3               2        1
2        3            3               2        1
2        3            3               2        0

CodePudding user response:

One option with reindexing:

df2 = df.set_index('user_id')

out = (df
   [['user_id']]
   .loc[df.index.repeat(df2['user_events'])]
   .assign(success=lambda d: d.groupby('user_id').cumcount()
                              .lt(d['user_id'].map(df2['user_successes']))
                              .astype(int)
          )
)

output:

   user_id  success
0        1        0
1        2        1
1        2        0
2        3        1
2        3        1
2        3        0
  • Related