I have tried to unstack a dataframe with duplicates jumping through different hoops. So far without result. I would be grateful for any help:
I have a dataframe that have a 'long' format:
| id | variable | value |
|----|-----------|-------|
| 1 | outcome_1 | NaN |
| 2 | outcome_1 | 18:33 |
| 2 | outcome_1 | 20:39 |
| 2 | outcome_3 | 01:40 |
| 3 | outcome_2 | 03:59 |
| 3 | outcome_4 | 07:46 |
| 3 | outcome_3 | 10:53 |
And would like to convert it to a 'wide' format, but without aggregation and preserving all values, so the result would look like this:
| id_nmbr | outcome_1_0 | outcome_1_1 | outcome_2_0 | outcome_3_0 | outcome_4_0 |
|---------|-------------|-------------|-------------|-------------|-------------|
| 1 | NaN | NaN | NaN | NaN | NaN |
| 2 | 18:33 | 20:39 | NaN | 01:40 | NaN |
| 3 | NaN | NaN | 03:59 | 07:46 | 10:53 |
So basically, preserve each value, and create a new column for each duplicate.
I have tried pivot or unstack, as well as pivot_table, but I think I need to string some functions together to achieve it. Any ideas?
CodePudding user response:
Use GroupBy.cumcount
for counter, then reshape by Series.unstack
with sorting MultiIndex
and flatten in map
:
g = df.groupby(['id','variable']).cumcount()
df = df.set_index(['id','variable', g])['value'].unstack([1,2]).sort_index(axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
id outcome_1_0 outcome_1_1 outcome_2_0 outcome_3_0 outcome_4_0
0 1 NaN NaN NaN NaN NaN
1 2 18:33 20:39 NaN 01:40 NaN
2 3 NaN NaN 03:59 10:53 07:46