I'm trying to create a new column that creates a string using 4 columns.
df.withColumn("input",
F.lit("http://address.com/process?field1={}&field2={}&field3={}&field4={}".format(F.col('field1'),F.col('field2'),F.col('field3'),F.col('field4'))).show()
However, where I'm trying to insert the column values into the string, it's showing up as field1=Column<'field1'>
instead of the actual value. I tried wraping it in F.format_string
as well but I'm not getting the actual values.
So what it should be returning is something like this, assuming that each column has a string = VALUE
.
http://address.com/process?field1=VALUE&field2=VALUE&field3=VALUE&field4=VALUE
CodePudding user response:
You can use format_string
function like this:
import pyspark.sql.functions as F
df = df.withColumn(
"input",
F.format_string(
"http://address.com/process?field1=%s&field2=%s&field3=%s&field4=%s",
F.col('field1'), F.col('field2'), F.col('field3'), F.col('field4')
)
)
df.show(truncate=False)
# ------ ------ ------ ------ --------------------------------------------------------------
#|field1|field2|field3|field4|input |
# ------ ------ ------ ------ --------------------------------------------------------------
#|a |b |c |d |http://address.com/process?field1=a&field2=b&field3=c&field4=d|
# ------ ------ ------ ------ --------------------------------------------------------------
CodePudding user response:
You should use concat
, not lit
, so something like F.concat(F.lit('http://example.com/'), F.col('field1'), F.lit('/'), F.col('field2'))