Home > Mobile >  How to pivot by value in pyspark
How to pivot by value in pyspark

Time:12-12

Here's my input

 ---- ----- --- ------ ---- ------ ------- -------- 
|year|month|day|new_ts|hour|minute|ts_rank|   label|
 ---- ----- --- ------ ---- ------ ------- -------- 
|2022|    1|  1|    13|  13|    24|      1|       7|
|2022|    1|  1|    14|  13|    24|      1|       8|
|2022|    1|  2|    15|  13|    24|      1|       7|
|2022|    1|  2|    16|  13|    44|      7|       8|
 ---- ----- --- ------ ---- ------ ------- -------- 

Here's my output

 ---- ----- --- ------- -------- 
|year|month|day|     7 |       8|
 ---- ----- --- ------- -------- 
|2022|    1|  1|     13|      14|
|2022|    1|  2|     15|      16|
 ---- ----- --- ------- -------- 

Here's the pandas code

df_pivot = df.pivot(index=["year","month","day"], columns="label", values="new_ts").reset_index()

What I try

df_pivot = df.groupBy(["year","month","day"]).pivot("label").value("new_ts")

Note: sorry I can't show my error message here, because I'm using cloud solution and its only show the line of error not error message

CodePudding user response:

df.groupBy("year","month","day").pivot('label').agg(first('new_ts')).show()


 ---- ----- --- --- --- 
|year|month|day|  7|  8|
 ---- ----- --- --- --- 
|2022|    1|  1| 13| 14|
|2022|    1|  2| 15| 16|
 ---- ----- --- --- --- 
  • Related