Home > Software design >  Insert a static list as a new column into PySpark dataframe
Insert a static list as a new column into PySpark dataframe

Time:10-11

I have a list of items:

my_list = ['a', 'b', 'c']

I have an existing dataframe, and I want to insert my_list as a new column into the existing dataframe.

Example input dataframe:

from pyspark.sql import functions as F
df = spark.createDataFrame([("1", "foo"), ("2", "bar"), ("3", "baz")], ["id", "value"])
df.show()
#  --- ----- 
# | id|value|
#  --- ----- 
# |  1|  foo|
# |  2|  bar|
# |  3|  baz|
#  --- ----- 

Desired output:

 --- ----- ---------- 
| id|value|new_column|
 --- ----- ---------- 
|  1|  foo| [a, b, c]|
|  2|  bar| [a, b, c]|
|  3|  baz| [a, b, c]|
 --- ----- ---------- 

CodePudding user response:

map can be used too:

df = df.withColumn("new_column", F.array(*map(F.lit, my_list)))

df.show()
#  --- ----- ---------- 
# | id|value|new_column|
#  --- ----- ---------- 
# |  1|  foo| [a, b, c]|
# |  2|  bar| [a, b, c]|
# |  3|  baz| [a, b, c]|
#  --- ----- ---------- 

CodePudding user response:

To insert a static list as a new column:

df = df.withColumn("new_column", F.array([F.lit(x) for x in my_list]))

df.show()

Which yields:

 --- ----- ---------- 
| id|value|new_column|
 --- ----- ---------- 
|  1|  foo| [a, b, c]|
|  2|  bar| [a, b, c]|
|  3|  baz| [a, b, c]|
 --- ----- ---------- 
  • Related