Strip part of a string in a column using pyspark-CodePudding

I am trying to strip a part of a string in a column using .withColumn

This is how the values are in the column:

df1["column1"] = ["Temp 1 (gen. comb.)", "Temp 1", "Temp 2 (gen. comb.)", "Temp 2","Temp 3 (gen. comb.)", "Temp 3"]

I want to strip the value (gen. comb.) from the column

Code I tried in PySpark:

result_df = res.withColumn('c_model_detail', F.regexp_replace('column1', '(gen. comb.)', ''))

But when i try the above the resultant column looks like this:

 result_df["column1"] = ["Temp 1 ()", "Temp 1", "Temp 2 ()", "Temp 2","Temp 3 ()", "Temp 3"]

Can anyone help me out with this? What is the mistake in the code I wrote?

In pandas I tried this code and works

result_df["column1"]  = df["column1"].str.replace(" (gen. comb.)","",regex=False)

Can anyone tell me how I can strip string using Pyspark?

CodePudding user response：

Since you are using regex replace, you need to escape the brackets.

This should work:

result_df = df.withColumn('c_model_detail', regexp_replace('column1', ' \(gen\. comb\.\)', ''))