Home > Software engineering >  Remove empty space and string at a string
Remove empty space and string at a string

Time:10-28

In rows that have values like:

' ,some value, some value,'

or

'some value, some value, '

Using pyspark, I need to remove the empty space and , from either the beginning or the end of the string. How is this done with regexp_replace?

CodePudding user response:

In PySpark,

DataFrame.fillna() or DataFrameNaFunctions.fill() 

is used to replace NULL/None values on all or selected multiple DataFrame columns with either zero(0), empty string, space, or any constant literal values.

CodePudding user response:

Use regex F.regexp_replace("text", r"(^[\s,]*|[\s,]*$)", "").

Full working exmaple:

df = spark.createDataFrame(
    [
        [" ,some value, some value,"],
        ["some value, some value, "]
    ],
    ["text"]
)

[Out]:
 ------------------------- 
|text                     |
 ------------------------- 
| ,some value, some value,|
|some value, some value,  |
 ------------------------- 


df = df.withColumn("text", F.regexp_replace("text", r"(^[\s,]*|[\s,]*$)", ""))

[Out]:
 ---------------------- 
|text                  |
 ---------------------- 
|some value, some value|
|some value, some value|
 ---------------------- 
  • Related