I'm using these 2 ways of processing string escaping a lot in the code:
1.
if (Seq("\\", "{" , "\"", "\"\"").exists(str.contains)) {
str.replace("\"","").
replace("{","-").
replace("\\", "-").
replace("\"\"","-")
}
if (Seq("|", "\"").exists(str.contains)) s""""${str.replace("\"", "\"\"")}"""" else str
It runs inside a Spark cluster and the execution time is very important. Is that the best way of doing that? Is there a better, more efficient way to do that?
CodePudding user response:
The code you provided is some thing like this:
Given some banned strings such as ["a", "b", "c"], if my string contains either of these string, go replace all the "a"s and "b"s and "c"s
So actually the checking part is redundant (I mean this part Seq(...).exists(...)
), it doubles your complexity in cases that your string contains those banned strings. If you want to do it using scala functions and UDFs, I suggest you do this:
str
.replaceAll("[{,\\\\,\"\"]", "-") // this means if you found either of {, \\, ", replace it with -
.replaceAll("\"", "") // and if you found ", replace it with empty string
You can also chain 2 regexp_replace
calls which is from spark APIs, you can choose between either of these 2.