Home > Software design >  Column Object Not Callable - using replace
Column Object Not Callable - using replace

Time:06-01

I'm currently trying to convert an epoch date that is returned via an API call in some JSON. I'm attempting to tidy the field up but I've tried regex's etc. and get the same error so I've tried to do something basic.

This is an example of the dataframe at the moment

 -------- --------------------- 
|d.userId|d.startDate          |
 -------- --------------------- 
|60000033|/Date(1659398400000)/|
|100003  |/Date(1635724800000)/|
|2       |/Date(1672531200000)/|
|100010  |/Date(1635724800000)/|

This is the code which generates the above dataframe.

from pyspark.sql.functions import explode, col
from pyspark.sql import functions as f

jsonDF2 = jsonDF.withColumn('d', explode(col('d.results')))
jsonDF2 = jsonDF2.select(jsonDF2.d.userId, jsonDF2.d.startDate)#.show(10, False)

The following line ends up with the error.

jsonDF3 = jsonDF2.withColumn("startDate", col('startDate').replace("\Date(", "")).show(10, False) 

Column Object Not Callable

CodePudding user response:

.replace is a python function not a pyspark df function use regexp_replace instead

jsonDF3 = jsonDF2.withColumn("startDate",regexp_replace(col('startDate'),"\d ","")).show(10,False) 

CodePudding user response:

You try to replace on the Column object. Column object does not have replace method. These are the methods of Column class.

You probably wanted to use this replace function. Sadly, it does not have DataFrame API. You can only access it using expr.

from pyspark.sql import functions as F

jsonDF2 = spark.createDataFrame(
    [(60000033, '/Date(1659398400000)/'),
     (100003, '/Date(1635724800000)/'),
     (2, '/Date(1672531200000)/'),
     (100010, '/Date(1635724800000)/')],
    ['d.userId', 'd.startDate']
)
jsonDF3 = jsonDF2.withColumn("startDate", F.expr("replace(`d.startDate`, '/Date(', '')"))

jsonDF3.show(truncate=0) 
#  -------- --------------------- --------------- 
# |d.userId|d.startDate          |startDate      |
#  -------- --------------------- --------------- 
# |60000033|/Date(1659398400000)/|1659398400000)/|
# |100003  |/Date(1635724800000)/|1635724800000)/|
# |2       |/Date(1672531200000)/|1672531200000)/|
# |100010  |/Date(1635724800000)/|1635724800000)/|
#  -------- --------------------- --------------- 

However, regexp_extract would probably be a better choice:

jsonDF3 = jsonDF2.withColumn("startDate", F.regexp_extract('`d.startDate`', '\d ', 0))
jsonDF3.show(truncate=0) 
#  -------- --------------------- ------------- 
# |d.userId|d.startDate          |startDate    |
#  -------- --------------------- ------------- 
# |60000033|/Date(1659398400000)/|1659398400000|
# |100003  |/Date(1635724800000)/|1635724800000|
# |2       |/Date(1672531200000)/|1672531200000|
# |100010  |/Date(1635724800000)/|1635724800000|
#  -------- --------------------- ------------- 
  • Related