I am trying to strip a part of a string in a column using .withColumn
This is how the values are in the column:
df1["column1"] = ["Temp 1 (gen. comb.)", "Temp 1", "Temp 2 (gen. comb.)", "Temp 2","Temp 3 (gen. comb.)", "Temp 3"]
I want to strip the value (gen. comb.) from the column
Code I tried in PySpark:
result_df = res.withColumn('c_model_detail', F.regexp_replace('column1', '(gen. comb.)', ''))
But when i try the above the resultant column looks like this:
result_df["column1"] = ["Temp 1 ()", "Temp 1", "Temp 2 ()", "Temp 2","Temp 3 ()", "Temp 3"]
Can anyone help me out with this? What is the mistake in the code I wrote?
In pandas I tried this code and works
result_df["column1"] = df["column1"].str.replace(" (gen. comb.)","",regex=False)
Can anyone tell me how I can strip string using Pyspark?
CodePudding user response:
Since you are using regex replace, you need to escape the brackets.
This should work:
result_df = df.withColumn('c_model_detail', regexp_replace('column1', ' \(gen\. comb\.\)', ''))