I have a spark dataframe in Databricks, with an ID and 200 other columns (like a pivot view of data). I would like to unpivot these data to make a tall object with half of the columns, where I'll end up with 100 rows per id. I'm using the Stack function and using specific column names.
Question is this: I'm new to scala and similar languages, and unfamiliar with best practices on how to us Brackets when literals are presented in multiple rows as below. Can I replace the Double quotes and with something else?
%scala
val unPivotDF = hiveDF.select($"id",
expr("stack(100, "
"'cat1', cat1, "
"'cat2', cat2, "
"'cat3', cat3, "
//...
"'cat99', cat99, "
"'cat100', cat100) as (Category,Value)"))
.where("Value is not null")
CodePudding user response:
You can use """
to define multiline strings like:
"""
some string
over multiple lines
"""
In your case this will only work assuming that the string you're writing tolerates new lines.
Considering how repetitive it is, you could also generate the string with something like:
(1 to 100)
.map(i => s"'cat$i', cat$i")
.mkString(",")
(To be adapted by the reader to exact needs)
Edit: and to answer your initial question: brackets won't help in any way here.