I was going through delta lake documentation page. There is a line like this :
from delta import *
builder = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
spark = spark = configure_spark_with_delta_pip(builder).getOrCreate()
In the last line, we see a double assignment of same variable (spark). Does this do something different compared to :
spark = spark = configure_spark_with_delta_pip(builder).getOrCreate()
(vs)
spark = configure_spark_with_delta_pip(builder).getOrCreate()
In general, in python, is there a meaning to double assignment of same variable ?
CodePudding user response:
Short answer: It's meaningless. Aside from a few wasted cycles redoing the same operation, the end result is identical.
Long answer: There are scenarios where it could have an effect, but none of them apply to code being executed normally (not through eval
/exec
) that is assigning to bare names (as opposed to assigning to a dotted name, where the descriptor protocol and other forms of attribute access customization could do weird things, or assigning to the result of a subscript expression like indexing/key-lookup or slicing, where the override of __setitem__
could invoke arbitrary code).
In this case, on the CPython reference interpreter, all it does is add a couple extra instructions, one to duplicate the value at the top of the stack (so it can assign it to two targets without recomputing it), and one to assign to spark
a second time (which just causes the old value assigned to it to be discarded).
The net effect of those two instructions is a couple unnecessary reference count manipulations (an extra incref when it doubles the top of the stack, and extra decref when it throws away the reference bound to spark
the first time in favor of the one bound the second time), and either one additional C array lookup (for STORE_FAST
/STORE_DEREF
, when executed in a function to a function local or closure variable respectively, with the latter adding a couple cheap pointer dereferences to find/update the location of the closure variable) or one additional dict
lookup (for the STORE_NAME
/STORE_GLOBAL
cases used for all other bare name assignments). The cost is trivial (low two-digit nanoseconds at most), so it's pretty harmless, just unnecessary.