Im working with a database that contains the following structure
ID | Test | Result |
---|---|---|
12a | Test1 | Normal |
12a | Test3 | 678.2 |
3s5 | Test2 | <1 |
3s5 | Test1 | Normal |
8r5 | Test4 | Rectangular(3 ) |
As you can see, the different tests have different result formats, and not every ID have all the tests.
I would like to transform this into something as follows:
ID | Test1 | Test2 | Test3 | Test4 |
---|---|---|---|---|
12a | Normal | NA | 678.2 | NA |
3s5 | Normal | <1 | NA | NA |
8r5 | NA | NA | NA | Rectangular(3 ) |
I've tried with pandas.pivot
but encountered the following error
df.pivot(index="ID",columns="Test",values="Result")
ValueError: Index contains duplicate entries, cannot reshape
Changing the Index to ID does not work, neither resetting index.
Any help will be greatly appreciated!
CodePudding user response:
Here is a way of doing it :
df = {'ID': ['12a', '12a', '3s5', '3s5', '8r5'],
'Test': ['Test1', 'Test3', 'Test2', 'Test1', 'Test4'],
'Result': ['Normal', '678.2', '<1', 'Normal', 'Rectangular(3 )']}
df=df.groupby(['ID', 'Test'])['Result'].sum().unstack(fill_value="NA")
CodePudding user response:
You can try using .pivot_table()
instead of .pivot()
, as follows:
df.pivot_table(index="ID", columns="Test", values="Result", aggfunc='first')
Result:
Test Test1 Test2 Test3 Test4
ID
12a Normal NaN 678.2 NaN
3s5 Normal <1 NaN NaN
8r5 NaN NaN NaN Rectangular(3 )