Home > Enterprise >  Pandas - unstack with duplicates
Pandas - unstack with duplicates

Time:10-19

I have tried to unstack a dataframe with duplicates jumping through different hoops. So far without result. I would be grateful for any help:

I have a dataframe that have a 'long' format:

| id | variable  | value |
|----|-----------|-------|
| 1  | outcome_1 | NaN   |
| 2  | outcome_1 | 18:33 |
| 2  | outcome_1 | 20:39 |
| 2  | outcome_3 | 01:40 |
| 3  | outcome_2 | 03:59 |
| 3  | outcome_4 | 07:46 |
| 3  | outcome_3 | 10:53 |

And would like to convert it to a 'wide' format, but without aggregation and preserving all values, so the result would look like this:

| id_nmbr | outcome_1_0 | outcome_1_1 | outcome_2_0 | outcome_3_0 | outcome_4_0 |
|---------|-------------|-------------|-------------|-------------|-------------|
| 1       | NaN         | NaN         | NaN         | NaN         | NaN         |
| 2       | 18:33       | 20:39       | NaN         | 01:40       | NaN         |
| 3       | NaN         | NaN         | 03:59       | 07:46       | 10:53       |

So basically, preserve each value, and create a new column for each duplicate.

I have tried pivot or unstack, as well as pivot_table, but I think I need to string some functions together to achieve it. Any ideas?

CodePudding user response:

Use GroupBy.cumcount for counter, then reshape by Series.unstack with sorting MultiIndex and flatten in map:

g = df.groupby(['id','variable']).cumcount()

df = df.set_index(['id','variable', g])['value'].unstack([1,2]).sort_index(axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
   id outcome_1_0 outcome_1_1 outcome_2_0 outcome_3_0 outcome_4_0
0   1         NaN         NaN         NaN         NaN         NaN
1   2       18:33       20:39         NaN       01:40         NaN
2   3         NaN         NaN       03:59       10:53       07:46
  • Related