In rows that have values like:
' ,some value, some value,'
or
'some value, some value, '
Using pyspark, I need to remove the empty space and ,
from either the beginning or the end of the string. How is this done with regexp_replace
?
CodePudding user response:
In PySpark,
DataFrame.fillna() or DataFrameNaFunctions.fill()
is used to replace NULL/None values on all or selected multiple DataFrame columns with either zero(0), empty string, space, or any constant literal values.
CodePudding user response:
Use regex F.regexp_replace("text", r"(^[\s,]*|[\s,]*$)", "")
.
Full working exmaple:
df = spark.createDataFrame(
[
[" ,some value, some value,"],
["some value, some value, "]
],
["text"]
)
[Out]:
-------------------------
|text |
-------------------------
| ,some value, some value,|
|some value, some value, |
-------------------------
df = df.withColumn("text", F.regexp_replace("text", r"(^[\s,]*|[\s,]*$)", ""))
[Out]:
----------------------
|text |
----------------------
|some value, some value|
|some value, some value|
----------------------