Home > Back-end >  Spark UDF error AttributeError: 'NoneType' object has no attribute '_jvm'
Spark UDF error AttributeError: 'NoneType' object has no attribute '_jvm'

Time:04-30

I found similar question link , but no answer provided how to fix the issue.

I want to make a UDF, that would extract for me words from column. So, I want to create a column named new_column, by applying my UDF to old_column

from pyspark.sql.functions import col, regexp_extract

re_string = 'some|words|I|need|to|match'

def regex_extraction(x,re_string):
    return regexp_extract(x,re_string,0)

extracting = udf(lambda row: regex_extraction(row,re_string))

df = df.withColumn("new_column", extracting(col('old_column')))

AttributeError: 'NoneType' object has no attribute '_jvm'

How to fix my function? I have many columns and want to loop through columns list and apply my UDF.

CodePudding user response:

You don't need a UDF. UDF is required when you cannot do something using PySpark, so you need some python functions or libraries. In your case your can have a function which accepts a column and returns a column, but that's it, UDF is not needed.

from pyspark.sql.functions import regexp_extract
df = spark.createDataFrame([('some match',)], ['old_column'])

re_string = 'some|words|I|need|to|match'

def regex_extraction(x, re_string):
    return regexp_extract(x, re_string, 0)

df = df.withColumn("new_column", regex_extraction('old_column', re_string))
df.show()
#  ---------- ---------- 
# |old_column|new_column|
#  ---------- ---------- 
# |some match|      some|
#  ---------- ---------- 

"Looping" through columns in a list can be implemented this way:

from pyspark.sql.functions import regexp_extract
cols = ['col1', 'col2']
df = spark.createDataFrame([('some match', 'match')], cols)

re_string = 'some|words|I|need|to|match'
def regex_extraction(x, re_string):
    return regexp_extract(x, re_string, 0)

df = df.select(
    '*',
    *[regex_extraction(c, re_string).alias(f'new_{c}') for c in cols]
)
df.show()
#  ---------- ----- -------- -------- 
# |      col1| col2|new_col1|new_col2|
#  ---------- ----- -------- -------- 
# |some match|match|    some|   match|
#  ---------- ----- -------- -------- 
  • Related