given the dataframe as follows:
index name reference value
0 name_1 ab1234 0.1
1 name_1 cd1111 0.2
2 name_1 cd1112 0.01
3 name_2 ab1234 0.03
4 name_2 ab4567 0.1
...
n name_j lm1234 0.x
How can one transform the data into a new dataframe such that the unique references are the index and the unique names are the columns.
We know that the unique references that make up the index are indices are reference.unique()
. And we also know that the unique names that make up the columns headers are name.unique()
.
The expected result would be a sparse matrix of this form:
reference name_1 name_2 name_3 ... name j
ab1234 0.1 0.03
cd1111 0.2
cd1112 0.01
ab1234
ab4567 0.1
... ...
lm1234 0.x
Expect that there is a vectorized way or a pivot
that would work.
For example, something like this was attempted, but failed:
new_df = df.pivot(index='reference', columns=name.unique(), values='value')
CodePudding user response:
df.pivot_table(index='reference', columns='name', values='value', aggfunc='sum')
Output:
name name_1 name_2
reference
ab1234 0.10 0.03
ab4567 NaN 0.10
cd1111 0.20 NaN
cd1112 0.01 NaN