I have a dataframe that looks like this:
ID | Company_Id | value | Approve or Reject |
---|---|---|---|
1A | 3412asd | value-1 | Approve |
2B | 2345tyu | value-2 | Approve |
3C | 9800bvd | value-3 | Approve |
2B | 2345tyu | value-1 | Approve |
Note that ID can repeat with different 'value'. ID, Company_ID are indices.
Now I need the output to be:
ID | Company_Id | value-1 | value-2 | value-3 |
---|---|---|---|---|
1A | 3412asd | Approve | NULL | NULL |
2B | 2345tyu | Approve | Approve | NULL |
3C | 9800bvd | NULL | NULL | Approve |
CodePudding user response:
pyspark pivot
df.groupBy('ID', 'Company_Id').pivot('value').agg(first('Approve or Reject')).show()