Home > Back-end >  How to create new string column in PySpark DataFrame based on values of other columns?
How to create new string column in PySpark DataFrame based on values of other columns?


I have a PySpark dataframe that has a couple of fields, e.g.:

Id Name Surname
1 John Johnson
2 Anna Maria

I want to create a new column that would mix the values of other comments into a new string. Desired output is:

Id Name Surname New
1 John Johnson Hey there John Johnson!
2 Anna Maria Hey there Anna Maria!

I'm trying to do (pseudocode):

df = df.withColumn("New", "Hey there "   Name   " "   Surname   "!")

How can this be achieved?

CodePudding user response:

You can use concat function or format_string like this:

from pyspark.sql import functions as F

df = df.withColumn(
    F.format_string("Hey there %s %s!", "Name", "Surname")

#  --- ---- ------- ----------------------- 
# |Id |Name|Surname|New                    |
#  --- ---- ------- ----------------------- 
# |1  |John|Johnson|Hey there John Johnson!|
# |2  |Anna|Maria  |Hey there Anna Maria!  |
#  --- ---- ------- ----------------------- 

If you prefer using concat:

F.concat(F.lit("Hey there "), F.col("Name"), F.lit(" "), F.col("Surname"), F.lit("!"))
  • Related