I have a PySpark dataframe that has a couple of fields, e.g.:
Id | Name | Surname |
---|---|---|
1 | John | Johnson |
2 | Anna | Maria |
I want to create a new column that would mix the values of other comments into a new string. Desired output is:
Id | Name | Surname | New |
---|---|---|---|
1 | John | Johnson | Hey there John Johnson! |
2 | Anna | Maria | Hey there Anna Maria! |
I'm trying to do (pseudocode):
df = df.withColumn("New", "Hey there " Name " " Surname "!")
How can this be achieved?
CodePudding user response:
You can use concat
function or format_string
like this:
from pyspark.sql import functions as F
df = df.withColumn(
"New",
F.format_string("Hey there %s %s!", "Name", "Surname")
)
df.show(truncate=False)
# --- ---- ------- -----------------------
# |Id |Name|Surname|New |
# --- ---- ------- -----------------------
# |1 |John|Johnson|Hey there John Johnson!|
# |2 |Anna|Maria |Hey there Anna Maria! |
# --- ---- ------- -----------------------
If you prefer using concat:
F.concat(F.lit("Hey there "), F.col("Name"), F.lit(" "), F.col("Surname"), F.lit("!"))