Home > Net >  Regexp_Replace in pyspark not working properly
Regexp_Replace in pyspark not working properly

Time:10-12

I am reading a csv file which is something like:

"ZEN","123"
"TEN","567"

Now if I am replacing character E with regexp_replace , its not giving correct results:

from pyspark.sql.functions import 
    row_number,col,desc,date_format,to_date,to_timestamp,regexp_replace

inputDirPath="/FileStore/tables/test.csv"

schema = StructType()
for field in fields:
    colType = StringType()
    schema.add(field.strip(),colType,True)

incr_df = spark.read.format("csv").option("header", 
         "false").schema(schema).option("delimiter", "\u002c").option("nullValue", 
          "").option("emptyValue","").option("multiline",True).csv(inputDirPath)

for column in incr_df.columns:
     inc_new=incr_df.withColumn(column, regexp_replace(column,"E","") )

inc_new.show()

is not giving correct results, it is doing nothing

Note : I have 100 columns, so need to use for loop

can someone help in spotting my error?

CodePudding user response:

List comprehension will be neater and easier. Lets try

inc_new =inc_new.select(*[regexp_replace(x,'E','').alias(x) for x in  inc_new.columns])

inc_new.show()
  • Related