I have a simple set of address data as below; simply trying to replace street names with Abbreviations:
14851 Jeffrey Rd
43421 Margarita St
110 South Ave
in my pyspark program I am simply using a regexp replace the abbreviations with full name like Road, Street,etc.
from pyspark.sql import *
from pyspark.sql.functions import when
from pyspark.sql.functions import col, regexp_extract
address = [(1,"14851 Jeffrey Rd","DE"),(2,"43421 Margarita St","NY"),(3,"13111 Siemon
Ave","CA"),(4,"110 South Ave","FL")]
df= spark.createDataFrame(name,["id","address","state"])
df.withColumn("address",
when(col("address").endsWith("Rd"),regexp_replace(col("address"),"Rd","Road"))
.when(col("address").endsWith("St"),regexp_replace(col("address"),"St","Street"))
.when(col("address").endsWith("Ave"),regexp_replace(col("address"),"Ave","Avenue"))
.otherwise("address"))
.show(false)
I tried replacing "col("address") with df.address or $"address" but I keep getting same error.
TypeError: 'Column' object is not callable
P.S running on Spark 3.1.2
CodePudding user response:
Change endsWith
function to all lowercase endswith
.
Example:
df.withColumn("address",
when(col("address").endswith("Rd"),regexp_replace(col("address"),"Rd","Road"))\
.when(col("address").endswith("St"),regexp_replace(col("address"),"St","Street"))\
.when(col("address").endswith("Ave"),regexp_replace(col("address"),"Ave","Avenue"))\
.otherwise(col("address")))\
.show(10,False)
# --- ---------------------- -----
#|id |address |state|
# --- ---------------------- -----
#|1 |14851 Jeffrey Road |DE |
#|2 |43421 Margarita Street|NY |
#|3 |13111 Siemon Avenue |CA |
#|4 |110 South Avenue |FL |
# --- ---------------------- -----
CodePudding user response:
It should be endswith
, not endsWith
,Note the case and the last show
function.
df.withColumn("address",
when(col("address").endswith("Rd"), regexp_replace(col("address"), "Rd", "Road"))
.when(col("address").endswith("St"), regexp_replace(col("address"), "St", "Street"))
.when(col("address").endswith("Ave"), regexp_replace(col("address"), "Ave", "Avenue"))
.otherwise("address")) \
.show(truncate=False)