In PySpark, we can use withField
:
from pyspark.sql import functions as F
df = spark.createDataFrame([((1,),)], ['mystruct'])
df = df.withColumn('mystruct', F.col('mystruct').withField('n', F.lit(3)))
df.printSchema()
# root
# |-- mystruct: struct (nullable = true)
# | |-- _1: long (nullable = true)
# | |-- n: integer (nullable = false)
How to use this in Spark SQL?
I've tried looking at the logical plan and applying the result to expr
:
df.explain(True)
# == Parsed Logical Plan ==
# 'Project [update_fields('mystruct, WithField(n, 3)) AS mystruct#84]
# - LogicalRDD [mystruct#82], false
# ...
df = spark.createDataFrame([((1,),)], ['mystruct'])
df = df.withColumn('mystruct', F.expr("update_fields(mystruct, WithField(n, 3))"))
Undefined function: update_fields. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.update_fields.; line 1 pos 0
CodePudding user response:
At this time, there no such support for SQL API.
You'll have to recreate the whole struct in order to update it, using either named_struct
or struct
functions.
To reference existing fields, you can use mystruct.*
as pointed in comments by @samkart:
df.createOrReplaceTempView("df")
df = spark.sql("SELECT struct(mystruct.*, 3 as n) as mystruct FROM df")
df.printSchema()
# root
# |-- mystruct: struct (nullable = false)
# | |-- _1: long (nullable = true)
# | |-- n: integer (nullable = false)
Note that you're looking at the logical plan, if you look at the physical plan you'll see the sql expression being applied:
# == Physical Plan ==
# *(1) Project [if (isnull(mystruct#0)) null else named_struct(_1, mystruct#0._1, n, 3) AS mystruct#2]
# - *(1) Scan ExistingRDD[mystruct#0]