Home > Software engineering >  Pivot dataframe without lose repeated values
Pivot dataframe without lose repeated values

Time:10-23

I have this entry

id   value  reps
1    333    1
1    332    4
1    335    1   
4    555    3
4    225    1
444  2      5

I want this, organizing the values by the column reps from less to more

id   col1 col2 col3 col4 
1    333  335  332  nan 
4    225  555  nan  nan 
444  2    nan  nan  nan

I have tried to use pivot table and got this

dataframe = dataframe.pivot_table(index='id', columns='reps', values='value')
dataframe = dataframe.rename_axis(columns=None).reset_index()

id   1      3      4    5
1    334    nan    332  nan
4    225.5  555.5  nan  nan
444  nan    nan    nan  2

CodePudding user response:

You can first sort the dataframe by reps (and also by id if it's the case) and then change reps by the cumulative count for each id.

Then you can perform the pivot table as you suggested.

df \
  .sort_values(['id','reps']) \
  .pipe(lambda df_: df_.assign(reps=df_.groupby(['id']).cumcount() 1)) \
  .pivot_table(index='id', columns='reps', values='value') \
  .add_prefix('col')

It will not return col4 with NaN in your expected output, but you can add it later.

CodePudding user response:

Group by "id" and create ("value", "reps") tuples, and sort it by "reps":

df_grouped = df.groupby("id")[["value", "reps"]].apply(lambda x: sorted(list(zip(x.value, x.reps)), key=lambda x: x[1]))

[Out]:
id
1      [(333, 1), (335, 1), (332, 4)]
4                [(225, 1), (555, 3)]
444                          [(2, 5)]

Create result dataframe from above sorted tuples using "value" as data (which is at index 0).

df_result = pd.DataFrame(data=[[y[0] for y in x] for x in df_grouped], index=df_grouped.index)

[Out]:
       0      1      2
id                    
1    333    335    332
4    225    555    NaN
444    2    NaN    NaN

pandas fills in default column names starting with 0. You can change it if required:

df_result.columns = list(range(1, len(df_result.columns)   1))
  • Related