I create the dataframe with schema in the following way:
val rdd = sc.parallelize(
Seq(
Row("first", 2.0),
Row("test", 1.5),
Row("choose", 8.0)
)
)
val schema: StructType = new StructType()
.add(StructField("id", StringType, true))
.add(StructField("val1", DoubleType, true))
val dfWithSchema = spark.createDataFrame(rdd, schema)
And I want to update id-column with arbitrary value:
I tried this:
dfWithSchema.withColumn("id", col("id"). (Random.nextString(10)))
But without expected result. Is there any way to do this by Spark 2.13 - ?
CodePudding user response:
You can concatenate with spark using the concat function:
dfWithSchema.withColumn("id", concat(col("id"),lit(Random.nextString(10)))).show()
CodePudding user response:
I found out the following solution:
dfWithSchema.withColumn("id", when(col("id").isNotNull, Random.nextString(10)))
However I am surprised that there are no direct way to update dataframe with new column values, but only by condition of the existing column values.