How to create new string column in PySpark DataFrame based on values of other columns?-CodePudding

I have a PySpark dataframe that has a couple of fields, e.g.:

Id	Name	Surname
1	John	Johnson
2	Anna	Maria

I want to create a new column that would mix the values of other comments into a new string. Desired output is:

Id	Name	Surname	New
1	John	Johnson	Hey there John Johnson!
2	Anna	Maria	Hey there Anna Maria!

I'm trying to do (pseudocode):

df = df.withColumn("New", "Hey there "   Name   " "   Surname   "!")

How can this be achieved?

CodePudding user response：

You can use concat function or format_string like this:

from pyspark.sql import functions as F

df = df.withColumn(
    "New", 
    F.format_string("Hey there %s %s!", "Name", "Surname")
)

df.show(truncate=False)
#  --- ---- ------- ----------------------- 
# |Id |Name|Surname|New                    |
#  --- ---- ------- ----------------------- 
# |1  |John|Johnson|Hey there John Johnson!|
# |2  |Anna|Maria  |Hey there Anna Maria!  |
#  --- ---- ------- -----------------------

If you prefer using concat:

F.concat(F.lit("Hey there "), F.col("Name"), F.lit(" "), F.col("Surname"), F.lit("!"))