I have a list of items:
my_list = ['a', 'b', 'c']
I have an existing dataframe, and I want to insert my_list
as a new column into the existing dataframe.
Example input dataframe:
from pyspark.sql import functions as F
df = spark.createDataFrame([("1", "foo"), ("2", "bar"), ("3", "baz")], ["id", "value"])
df.show()
# --- -----
# | id|value|
# --- -----
# | 1| foo|
# | 2| bar|
# | 3| baz|
# --- -----
Desired output:
--- ----- ----------
| id|value|new_column|
--- ----- ----------
| 1| foo| [a, b, c]|
| 2| bar| [a, b, c]|
| 3| baz| [a, b, c]|
--- ----- ----------
CodePudding user response:
map
can be used too:
df = df.withColumn("new_column", F.array(*map(F.lit, my_list)))
df.show()
# --- ----- ----------
# | id|value|new_column|
# --- ----- ----------
# | 1| foo| [a, b, c]|
# | 2| bar| [a, b, c]|
# | 3| baz| [a, b, c]|
# --- ----- ----------
CodePudding user response:
To insert a static list as a new column:
df = df.withColumn("new_column", F.array([F.lit(x) for x in my_list]))
df.show()
Which yields:
--- ----- ----------
| id|value|new_column|
--- ----- ----------
| 1| foo| [a, b, c]|
| 2| bar| [a, b, c]|
| 3| baz| [a, b, c]|
--- ----- ----------