I'm coming from a python background, trying to convert a function over in to scala.
In this dummy example, I have multiple (unknown number) of dataframes that I need to union together.
%python
list_of_dfs = [
spark.createDataFrame(
[('A', 'C'),
('B', 'E')
], ['dummy1','dummy2']),
spark.createDataFrame(
[('F', 'G'),
('H', 'I')
], ['dummy1','dummy2'])]
for i, df in enumerate(list_of_dfs):
if i == 0:
union_df = df
else:
union_df = union_df.unionAll(df)
union_df.display()
Works just how I want it to. The "union_df = union_df.unionAll(df)" is specifically what I'm having trouble reproducing in scala.
%scala
... outer loop creates each iterations dataframe
if(i==0) {
val union_df=df
} else{
val union_df=union_df.union(df)
}
I get this "error: recursive value union_df needs type". Which I'm having trouble translating the documentation in to my solution, because the type is a dataframe. Obviously I need to actually learn something about scala, but this is the bridge I'm trying to cross right now. Appreciate any help.
CodePudding user response:
You don't need to manually manage a loop to go through the collection in Scala. Since you're trying to go from many values to one we can use the reduce
method:
val dfs: Iterable[DataFrame] = ???
val union_df = dfs.reduce(_ union _)
CodePudding user response:
in the Scala code you have val union_df=union_df.union(df)
-> you are defining a value and tried to call it.
should be something like this:
if(i==0) {
val union_df=df
} else{
union_df = union_df.union(df)
}
The previous answer is better, use reduce or foldLeft(foldRight) function instead.
CodePudding user response:
I'll accept Jarrod Baker's answer since I'm sure it's more appropriate.
But what ended up working for me was instantiating it as an empty dataframe and then doing the appends.
%scala
... outer loop creates each iterations dataframe
var union_df = spark.emptyDataFrame
if(i==0) {
union_df=df
} else{
union_df=union_df.union(df)
}