Home > Software engineering >  Union of Spark DataFrames
Union of Spark DataFrames

Time:06-23

I've tried this code to add a row to a dataframe if df2 is empty but I get this error and I don't understand the reason. I don't have any column called value.

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: The number of columns doesn't match. Old column names (1): value New column names (2): country, code

var df1 = Seq.empty[(String,String)].toDF("country","code").

val df2 = spark.emptyDataFrame

if (df2.isEmpty) df1 = df1.union(Seq("GLOBAL" , "EMPTY").toDF("country","code"))

CodePudding user response:

Dataframes, like datasets and RDDs, are immutable. So you need to create a new Dataframe when appending a row to it. The Dataframe union() method is used to combine two DataFrames of the same structure or schema. If schemas are not the same it returns an error.

To respect the schemas, you need to use the union method on df1. To create a new DataFrame with the row you want you should use something like:

 val df3 = df1.union(Seq("GLOBAL" , "EMPTY").toDF())
  • Related