I have a dataframe like this:
Sequence | Message |
---|---|
1 | A |
2 | B |
3 | C |
4 | D |
1 | A |
2 | C |
3 | D |
4 | E |
5 | F |
1 | A |
2 | C |
3 | D |
And I'm trying to pivot it to have table with maximum Sequence
length as columns and corresponding possible Message
as values.
1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|
A | B | C | D | |||
A | C | D | E | F | ||
A | C | D |
I'm trying to use pd.get_dummies() for Sequence
, but can't figure out how to keep Message
values.
Appreciate any help.
CodePudding user response:
A possible solution:
aux = df.groupby('Sequence').agg(list).T.iloc[0,:].rename_axis(None)
(pd.concat([pd.DataFrame(np.array(x)[:,None]) for x in aux], axis=1)
.set_axis(aux.index, axis=1))
Output:
1 2 3 4 5
0 A B C D F
1 A C D E NaN
2 A C D NaN NaN
CodePudding user response:
Use a pivot
with df['Sequence'].diff().le(0).cumsum()
as index (new row when the Sequence restarts to an earlier number):
(df.assign(index=df['Sequence'].diff().le(0).cumsum())
.pivot(index='index', columns='Sequence', values='Message')
.rename_axis(index=None, columns=None) # optional
)
Output:
1 2 3 4 5
0 A B C D NaN
1 A C D E F
2 A C D NaN NaN