I have this entry
id value reps
1 333 1
1 332 4
1 335 1
4 555 3
4 225 1
444 2 5
I want this, organizing the values by the column reps from less to more
id col1 col2 col3 col4
1 333 335 332 nan
4 225 555 nan nan
444 2 nan nan nan
I have tried to use pivot table and got this
dataframe = dataframe.pivot_table(index='id', columns='reps', values='value')
dataframe = dataframe.rename_axis(columns=None).reset_index()
id 1 3 4 5
1 334 nan 332 nan
4 225.5 555.5 nan nan
444 nan nan nan 2
CodePudding user response:
You can first sort the dataframe by reps
(and also by id
if it's the case) and then change reps
by the cumulative count for each id
.
Then you can perform the pivot table as you suggested.
df \
.sort_values(['id','reps']) \
.pipe(lambda df_: df_.assign(reps=df_.groupby(['id']).cumcount() 1)) \
.pivot_table(index='id', columns='reps', values='value') \
.add_prefix('col')
It will not return col4 with NaN
in your expected output, but you can add it later.
CodePudding user response:
Group by "id" and create ("value", "reps") tuples, and sort it by "reps":
df_grouped = df.groupby("id")[["value", "reps"]].apply(lambda x: sorted(list(zip(x.value, x.reps)), key=lambda x: x[1]))
[Out]:
id
1 [(333, 1), (335, 1), (332, 4)]
4 [(225, 1), (555, 3)]
444 [(2, 5)]
Create result dataframe from above sorted tuples using "value" as data (which is at index 0
).
df_result = pd.DataFrame(data=[[y[0] for y in x] for x in df_grouped], index=df_grouped.index)
[Out]:
0 1 2
id
1 333 335 332
4 225 555 NaN
444 2 NaN NaN
pandas
fills in default column names starting with 0
. You can change it if required:
df_result.columns = list(range(1, len(df_result.columns) 1))