Home > Mobile >  How can I add several columns (still not populated) to the DataFrame in Spark Structured Streaming
How can I add several columns (still not populated) to the DataFrame in Spark Structured Streaming

Time:11-18

I have a Kafka stream with the standart Kafka schema. I'd like to add a bunch of columns to make this stream being possible to union. I'd like to reuse schema variable

val schema = StructType(
    StructField("id", LongType, nullable = false) ::
      StructField("Energy Data", StringType, nullable = false) ::
      StructField("Distance", StringType, nullable = false) ::
      StructField("Humidity", StringType, nullable = false) ::
      StructField("Ambient Temperature", StringType, nullable = false) ::
      StructField("Cold Water Temperature", StringType, nullable = false) ::
      StructField("Vibration Value 1", StringType, nullable = false) ::
      StructField("Vibration Value 2", StringType, nullable = false) ::
      StructField("Handle Movement", StringType, nullable = false) ::
      StructField("Make Coffee", StringType, nullable = false) ::
      Nil)

Is there something like

.withColumns(schema)

not to duplicate the structure, but to reuse the same schema as the source of list of the columns to be added?

UPD:

val iter=schema.iterator
    while(iter.hasNext)
      {
        controlDataFrame=controlDataFrame.withColumn(iter.next.name,lit(""))
      }

worked for me

CodePudding user response:

Maybe you could try something like:

xs.withColumn("y", lit(null).cast(StringType))

to add empty columns. You could get the schema then from xs.schema but I am not sure if this solves your problem if you want to reuse the original variable.

  • Related