PySpark: GroupBy and count the sum of unique values for a column-CodePudding

I have a device_id's event data which might be successful sometime and unsuccessful sometime.

Is there a way to do a group by and get result for an Id in a single row like this:

device_id	success_count	unsuccessful_count
1	2	3
2	1	2

I have been trying several ways using group by but I haven't been able to get the success_count and unsuccessful_count for a device_id in single row.

CodePudding user response：

You need to group your data by device id and then pivot by status and count:

df.groupBy("device_id").pivot("status").count()