Home > Back-end >  How to create new string column in PySpark DataFrame based on values of other columns?
How to create new string column in PySpark DataFrame based on values of other columns?

Time:08-04

I have a PySpark dataframe that has a couple of fields, e.g.:

Id Name Surname
1 John Johnson
2 Anna Maria

I want to create a new column that would mix the values of other comments into a new string. Desired output is:

Id Name Surname New
1 John Johnson Hey there John Johnson!
2 Anna Maria Hey there Anna Maria!

I'm trying to do (pseudocode):

df = df.withColumn("New", "Hey there "   Name   " "   Surname   "!")

How can this be achieved?

CodePudding user response:

You can use concat function or format_string like this:

from pyspark.sql import functions as F

df = df.withColumn(
    "New", 
    F.format_string("Hey there %s %s!", "Name", "Surname")
)

df.show(truncate=False)
#  --- ---- ------- ----------------------- 
# |Id |Name|Surname|New                    |
#  --- ---- ------- ----------------------- 
# |1  |John|Johnson|Hey there John Johnson!|
# |2  |Anna|Maria  |Hey there Anna Maria!  |
#  --- ---- ------- ----------------------- 

If you prefer using concat:

F.concat(F.lit("Hey there "), F.col("Name"), F.lit(" "), F.col("Surname"), F.lit("!"))
  • Related