Home > Back-end >  Transforming the data from a single column into multiple columns in pandas
Transforming the data from a single column into multiple columns in pandas

Time:09-23

Im working with a database that contains the following structure

ID Test Result
12a Test1 Normal
12a Test3 678.2
3s5 Test2 <1
3s5 Test1 Normal
8r5 Test4 Rectangular(3 )

As you can see, the different tests have different result formats, and not every ID have all the tests.

I would like to transform this into something as follows:

ID Test1 Test2 Test3 Test4
12a Normal NA 678.2 NA
3s5 Normal <1 NA NA
8r5 NA NA NA Rectangular(3 )

I've tried with pandas.pivot but encountered the following error

df.pivot(index="ID",columns="Test",values="Result")

ValueError: Index contains duplicate entries, cannot reshape

Changing the Index to ID does not work, neither resetting index.

Any help will be greatly appreciated!

CodePudding user response:

Here is a way of doing it :

df = {'ID': ['12a', '12a', '3s5', '3s5', '8r5'],
  'Test': ['Test1', 'Test3', 'Test2', 'Test1', 'Test4'],
  'Result': ['Normal', '678.2', '<1', 'Normal', 'Rectangular(3 )']}

df=df.groupby(['ID', 'Test'])['Result'].sum().unstack(fill_value="NA")

CodePudding user response:

You can try using .pivot_table() instead of .pivot(), as follows:

df.pivot_table(index="ID", columns="Test", values="Result", aggfunc='first')

Result:

Test   Test1 Test2  Test3            Test4
ID                                        
12a   Normal   NaN  678.2              NaN
3s5   Normal    <1    NaN              NaN
8r5      NaN   NaN    NaN  Rectangular(3 )
  • Related