pivot dataframe such that unique column names become headers and unique references become index (spa-CodePudding

given the dataframe as follows:

index    name    reference    value
0        name_1  ab1234       0.1 
1        name_1  cd1111       0.2
2        name_1  cd1112       0.01
3        name_2  ab1234       0.03       
4        name_2  ab4567       0.1
...
n        name_j  lm1234       0.x

How can one transform the data into a new dataframe such that the unique references are the index and the unique names are the columns.

We know that the unique references that make up the index are indices are reference.unique(). And we also know that the unique names that make up the columns headers are name.unique().

The expected result would be a sparse matrix of this form:

reference    name_1    name_2    name_3    ...    name j
ab1234       0.1       0.03
cd1111       0.2
cd1112       0.01
ab1234       
ab4567                 0.1

...                                        ...    
lm1234                                            0.x

Expect that there is a vectorized way or a pivot that would work. For example, something like this was attempted, but failed:

new_df = df.pivot(index='reference', columns=name.unique(), values='value')

CodePudding user response：

df.pivot_table(index='reference', columns='name', values='value', aggfunc='sum')

Output:

name       name_1  name_2
reference
ab1234       0.10    0.03
ab4567        NaN    0.10
cd1111       0.20     NaN
cd1112       0.01     NaN