Home > other >  Pyspark Dataframe Convert category row values into columns with aggregate on multiple columns
Pyspark Dataframe Convert category row values into columns with aggregate on multiple columns

Time:07-27

I have a PySpark dataframe as below:

Id variable old_val new_val
a1 frequency 2.0 25.0
a1 latitude 25.762 25.729
a1 longitude -80.192 -80.436
a2 frequency 1.0 5.0
a2 latitude 25.7 25.762
a2 longitude -80.436 -80.192

I am trying to reflect the changes by "id".

I would like to achieve the below ideal state:

Id freq_old_val freq_new_val lat_old_val lat_new_val long_old_val long_new_val
a1 2.0 25.0 25.762 25.729 -80.192 -80.436
a2 1.0 5.0 25.7 25.762 -80.436 -80.192


My useless code with a useful attempt

I am unsure if i must use explode. I am also unsure if agg can be passed with two column values.

import org.apache.spark.sql.functions._
df.groupBy("id").pivot("variable").agg(first("old_val","new_val")) 

I am fairly new to pyspark, working my way through it. Any guidance and help is highly appreciated. Thank you for taking the time to guide.

CodePudding user response:

I think similar question is already answered here: How to pivot on multiple columns in Spark SQL?

Please comment if it is not clear

  • Related