Home > OS >  How to transpose pyspark dataframe which has multiple index columns?
How to transpose pyspark dataframe which has multiple index columns?

Time:12-21

I have a dataframe that looks like this:

ID Company_Id value Approve or Reject
1A 3412asd value-1 Approve
2B 2345tyu value-2 Approve
3C 9800bvd value-3 Approve
2B 2345tyu value-1 Approve

Note that ID can repeat with different 'value'. ID, Company_ID are indices.

Now I need the output to be:

ID Company_Id value-1 value-2 value-3
1A 3412asd Approve NULL NULL
2B 2345tyu Approve Approve NULL
3C 9800bvd NULL NULL Approve

CodePudding user response:

pyspark pivot

df.groupBy('ID',    'Company_Id').pivot('value').agg(first('Approve or Reject')).show()
  • Related