Home > front end >  PySpark: GroupBy and count the sum of unique values for a column
PySpark: GroupBy and count the sum of unique values for a column

Time:12-06

I have a device_id's event data which might be successful sometime and unsuccessful sometime.

device_id status
1 Successful
1 UnSuccessful
1 UnSuccessful
1 UnSuccessful
1 Successful
2 Successful
2 UnSuccessful
2 UnSuccessful

Is there a way to do a group by and get result for an Id in a single row like this:

device_id success_count unsuccessful_count
1 2 3
2 1 2

I have been trying several ways using group by but I haven't been able to get the success_count and unsuccessful_count for a device_id in single row.

CodePudding user response:

You need to group your data by device id and then pivot by status and count:

df.groupBy("device_id").pivot("status").count()
  • Related