I have a dataframe, df_res
with volumetric and mean and standard deviation jacobian values against ID strings (subjid
), software type ran (pipeline
), and anatomical region labels (label_id
):
df_res =
subjid pipeline label_id volume_(mm^3) mean_jacobian stdev_jacobian
0 100007_t0 Rigid 0 10100000.0 1.11 0.078
1 100007_t0 Rigid 1 315439.0 1.04635 0.283
2 100007_t0 Rigid 2 624165.0 0.968231 0.192
3 100007_t0 Rigid 3 515589.0 1.1273 0.229
4 100007_t1 Rigid 0 10084600.0 1.0935 0.033
5 100007_t1 Rigid 1 320533.0 1.0457 0.277
6 100007_t1 Rigid 2 621393.0 0.957 0.193
7 100007_t1 Rigid 3 507840.0 1.00573 0.232
My goal dataframe is something where I would want to have one row per subjid
-pipeline
pair, and the label_id
is almost transposed as column per quantity.
The preview below is a "condensed version" for formatting's sake.
Assume that each label_id
from df_res
has its own {label_id}_volume_(mm^3)
, {label_id}_mean_jacobian
, and {label_id}_stdev_jacobian
columns (total of 9 for each label_id
pipeline
subjid
= 11 total columns):
df_goal =
subjid pipeline label0_volume_(mm^3) ... label3_volume_(mm^3) ... label3_mean_jacobian
100007_t0 Rigid 10100000.0 515589.0 1.1273
100007_t1 Rigid 10084600.0 507840.0 1.00573
I managed to get something close, with df_res_test
. I did a df.pivot()
function:
>>> df_res_pivot = df_res.pivot(index="subjid", columns="label_id", values=["volume_(mm^3)", "mean_jacobian", "stdev_jacobian"])
df_res_pivot =
volume_(mm^3) ... stdev_jacobian
label_id 0 1 2 ... 1 2 3
subjid ...
100007_t0 10100000.0 315439.0 624165.0 ... 0.289318 0.192214 0.229341
100007_t1 10084600.0 320533.0 621393.0 ... 0.277735 0.193940 0.232486
[2 rows x 12 columns]
Is there a way I can rename / combine the values
and columns
argument to make my data more sensible and look like df_goal
?
CodePudding user response:
@AndrejKesely's solution is great. Another option is with pivot_wider from pyjanitor, using the names_glue
parameter:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_wider(
index = ['subjid', 'pipeline'],
names_from = 'label_id',
flatten_levels=True,
names_glue = "label{label_id}_{_value}")
)
subjid pipeline label0_volume_(mm^3) label1_volume_(mm^3) ... label0_stdev_jacobian label1_stdev_jacobian label2_stdev_jacobian label3_stdev_jacobian
0 100007_t0 Rigid 10100000.0 315439.0 ... 0.078 0.283 0.192 0.229
1 100007_t1 Rigid 10084600.0 320533.0 ... 0.033 0.277 0.193 0.232
CodePudding user response:
Try:
df = df.pivot(index=["subjid", "pipeline"], columns=["label_id"])
df.columns = [f"label{b}_{a}" for a, b in df.columns]
print(df.reset_index())
Prints:
subjid pipeline label0_volume_(mm^3) label1_volume_(mm^3) label2_volume_(mm^3) label3_volume_(mm^3) label0_mean_jacobian label1_mean_jacobian label2_mean_jacobian label3_mean_jacobian label0_stdev_jacobian label1_stdev_jacobian label2_stdev_jacobian label3_stdev_jacobian
0 100007_t0 Rigid 10100000.0 315439.0 624165.0 515589.0 1.1100 1.04635 0.968231 1.12730 0.078 0.283 0.192 0.229
1 100007_t1 Rigid 10084600.0 320533.0 621393.0 507840.0 1.0935 1.04570 0.957000 1.00573 0.033 0.277 0.193 0.232
EDIT: With the dataframe df_res_pivot
you have in your question:
df_res_pivot = df.pivot(
index="subjid",
columns="label_id",
values=["volume_(mm^3)", "mean_jacobian", "stdev_jacobian"],
)
df_res_pivot.columns = [f"label{b}_{a}" for a, b in df_res_pivot.columns]
print(df_res_pivot.reset_index())
Prints:
subjid label0_volume_(mm^3) label1_volume_(mm^3) label2_volume_(mm^3) label3_volume_(mm^3) label0_mean_jacobian label1_mean_jacobian label2_mean_jacobian label3_mean_jacobian label0_stdev_jacobian label1_stdev_jacobian label2_stdev_jacobian label3_stdev_jacobian
0 100007_t0 10100000.0 315439.0 624165.0 515589.0 1.1100 1.04635 0.968231 1.12730 0.078 0.283 0.192 0.229
1 100007_t1 10084600.0 320533.0 621393.0 507840.0 1.0935 1.04570 0.957000 1.00573 0.033 0.277 0.193 0.232