Home > database >  Create separate columns for key, value pairs contained in two columns of Spark Dataframe in Scala
Create separate columns for key, value pairs contained in two columns of Spark Dataframe in Scala

Time:07-09

I am new to Spark/Scala and have been struggling with this problem. So far, I have looked into similar questions involving explode and split, but have had no luck so far.

Here is an example input Dataframe:

id attr_name attr_value
0 name James
0 hair_color black
1 name George
1 hair_color black
2 name Jack
2 hair_color white
2 eye_color blue

And here is an example of the output I am looking for:

id name hair_color eye_color
0 James black
1 George black
2 Jack white blue

Any help would be appreciated here, thanks!

CodePudding user response:

I believe you're looking for pivot. Your example becomes something like:

df.groupBy($"id")
  .pivot($"attr_name", Seq("hair_color", "eye_color"))
  .agg(first($"attr_value"))

Explicitly spelling out the values in the attr_name column will give you a decent performance improvement. I have to admit I'm not sure whether the agg is necessary, given that you have one element in each group.

  • Related