Home > Software design >  Pandas: get dummies and keep other column as values
Pandas: get dummies and keep other column as values

Time:01-24

I have a dataframe like this:

Sequence Message
1 A
2 B
3 C
4 D
1 A
2 C
3 D
4 E
5 F
1 A
2 C
3 D

And I'm trying to pivot it to have table with maximum Sequence length as columns and corresponding possible Message as values.

1 2 3 4 5 6 7
A B C D
A C D E F
A C D

I'm trying to use pd.get_dummies() for Sequence , but can't figure out how to keep Message values.

Appreciate any help.

CodePudding user response:

A possible solution:

aux = df.groupby('Sequence').agg(list).T.iloc[0,:].rename_axis(None)

(pd.concat([pd.DataFrame(np.array(x)[:,None]) for x in aux], axis=1)
 .set_axis(aux.index, axis=1))

Output:

   1  2  3    4    5
0  A  B  C    D    F
1  A  C  D    E  NaN
2  A  C  D  NaN  NaN

CodePudding user response:

Use a pivot with df['Sequence'].diff().le(0).cumsum() as index (new row when the Sequence restarts to an earlier number):

(df.assign(index=df['Sequence'].diff().le(0).cumsum())
   .pivot(index='index', columns='Sequence', values='Message')
   .rename_axis(index=None, columns=None) # optional
)

Output:

   1  2  3    4    5
0  A  B  C    D  NaN
1  A  C  D    E    F
2  A  C  D  NaN  NaN
  • Related