How to transpose rows into columns on pyspark?-CodePudding

One question: how can I transpose rows into columns on pyspark? My original dataframe looks like this:

ID | DATE       |  APP       |  DOWNLOADS  |  ACTIVE_USERS
___________________________________________________________
0  | 2021-01-10 |  FACEBOOK  |  1000       |  5000
1  | 2021-01-10 |  INSTAGRAM |  9000       |  90000
2  | 2021-02-10 |  FACEBOOK  |  9000       |  72000
3  | 2021-02-10 |  INSTAGRAM |  16000      |  500000

But I need it like this:

ID | DATE       | FACEBOOK - DOWNLOADS | FACEBOOK - ACTIVE_USERS | INSTAGRAM - DOWNLOADS | INSTAGRAM - ACTIVE_USERS
___________________________________________________________________________________________________________________
0  | 2021-01-10 | 1000                 | 5000                    | 9000                  | 90000  
1  | 2021-02-10 | 9000                 | 72000                   | 16000                 | 50000

I tried using the answer on this question: Transpose pyspark rows into columns, but i couldn't make it work.

Could you help me please? Thank you!

CodePudding user response：

From your example I assume the "ID" column is not needed to group on, as it looks to be reset in your outcome. That would make the query something like below:

import pyspark.sql.functions as F
df.groupBy("DATE").pivot('APP').agg(
    F.first('DOWNLOADS').alias("DOWNLOADS"), 
    F.first("ACTIVE_USERS").alias("ACTIVE_USERS")
)

We groupby the date and pivot on app and retrieve the first value for downloads and active users.

outcome:

 ---------- ------------------ --------------------- ------------------- ---------------------- 
|      DATE|FACEBOOK_DOWNLOADS|FACEBOOK_ACTIVE_USERS|INSTAGRAM_DOWNLOADS|INSTAGRAM_ACTIVE_USERS|
 ---------- ------------------ --------------------- ------------------- ---------------------- 
|2021-02-10|              9000|                72000|              16000|                500000|
|2021-01-10|              1000|                 5000|               9000|                 90000|
 ---------- ------------------ --------------------- ------------------- ----------------------