I am new to Spark/Scala and have been struggling with this problem. So far, I have looked into similar questions involving explode
and split
, but have had no luck so far.
Here is an example input Dataframe:
id | attr_name | attr_value |
---|---|---|
0 | name | James |
0 | hair_color | black |
1 | name | George |
1 | hair_color | black |
2 | name | Jack |
2 | hair_color | white |
2 | eye_color | blue |
And here is an example of the output I am looking for:
id | name | hair_color | eye_color |
---|---|---|---|
0 | James | black | |
1 | George | black | |
2 | Jack | white | blue |
Any help would be appreciated here, thanks!
CodePudding user response:
I believe you're looking for pivot. Your example becomes something like:
df.groupBy($"id")
.pivot($"attr_name", Seq("hair_color", "eye_color"))
.agg(first($"attr_value"))
Explicitly spelling out the values in the attr_name
column will give you a decent performance improvement. I have to admit I'm not sure whether the agg
is necessary, given that you have one element in each group.