Here's my input
---- ----- --- ------ ---- ------ ------- --------
|year|month|day|new_ts|hour|minute|ts_rank| label|
---- ----- --- ------ ---- ------ ------- --------
|2022| 1| 1| 13| 13| 24| 1| 7|
|2022| 1| 1| 14| 13| 24| 1| 8|
|2022| 1| 2| 15| 13| 24| 1| 7|
|2022| 1| 2| 16| 13| 44| 7| 8|
---- ----- --- ------ ---- ------ ------- --------
Here's my output
---- ----- --- ------- --------
|year|month|day| 7 | 8|
---- ----- --- ------- --------
|2022| 1| 1| 13| 14|
|2022| 1| 2| 15| 16|
---- ----- --- ------- --------
Here's the pandas code
df_pivot = df.pivot(index=["year","month","day"], columns="label", values="new_ts").reset_index()
What I try
df_pivot = df.groupBy(["year","month","day"]).pivot("label").value("new_ts")
Note: sorry I can't show my error message here, because I'm using cloud solution and its only show the line of error not error message
CodePudding user response:
df.groupBy("year","month","day").pivot('label').agg(first('new_ts')).show()
---- ----- --- --- ---
|year|month|day| 7| 8|
---- ----- --- --- ---
|2022| 1| 1| 13| 14|
|2022| 1| 2| 15| 16|
---- ----- --- --- ---