I need to remove the dashes between characters only but the way I'm doing it I need to run two separate replaceAll
commands - is there a way to do it with just one? It must allow for French characters.
.replaceAll("([a-zA-ZÀ-ÿ])-([a-zA-ZÀ-ÿ])", "$1 $2")
The desired outcome is from this:
Pourquoi s'intéresse-t-elle à lui.
Ma chère je comptais là-dessus - figurez-vous.
Il y a 1-2 chemins que nous pouvons choisir.
to this:
Pourquoi s'intéresse t elle à lui
Ma chère je comptais là dessus - figurez vous.
Il y a 1-2 chemins que nous pouvons choisir.
With the above pattern I can only fix the second sentence or with an adjustment, the first sentence but I want to fix them both with one pattern..
Regex to match the first sentence:
https://regex101.com/r/fHQNYP/1
and to match the second sentence:
https://regex101.com/r/j4YGrj/1
CodePudding user response:
You can't match hyphens on both ends of a t
word because the trailing ([a-zA-ZÀ-ÿ])
in the regex consumes the t
, and there is no way for the regex to match it again during the next iteration. Have a look:
Pourquoi s'intéresse-t-elle à lui.
^^^
First match (replace with " t ")
Pourquoi s'intéresse-t-elle à lui.
^
-- search goes on from here, no more matches!
You want to only match hyphens between letters, so use
.replaceAll("(?<=\\p{L})-(?=\\p{L})", " ")
See the regex demo. Details:
(?<=\p{L})
- a positive lookbehind that matches a location immediately preceded with any Unicode letter-
- a hyphen(?=\p{L})
- a positive lookahead that matches a location that is immediately followed with any Unicode letter.
CodePudding user response:
That is a tricky one. For one side, you need to specify all the accented characters. You cannot use \w
because that would only include non-accented. And even though, having patterns like -t-
implies that the second character has to be declared as zero width.
So, I think this can solve both problems:
"Pourquoi s'intéresse-t-elle à lui. Ma chère je comptais là-dessus - figurez-vous."
.replaceAll("([a-zA-Z\\u00C0-\\u017F])-((?=[a-zA-Z\\u00C0-\\u017F]))", "$1 $2");
CodePudding user response:
This is the regular expression to replace all dashes.
stringValue.replaceAll("[-]", " ")
This is for replacing dashes in between words.
stringValue.replaceAll("(\\b-\\b)", " ")
This is late but work for the answer:
stringValue.replaceAll("(^[a-zA-ZÀ-ÿ](?<![a-zA-ZÀ-ÿ])(?=[a-zA-ZÀ-ÿ])|(?<=[a-zA-ZÀ-ÿ])(?![a-zA-ZÀ-ÿ])-(?<![a-zA-ZÀ-ÿ])(?=[a-zA-ZÀ-ÿ])|(?<=[a-zA-ZÀ-ÿ])(?![a-zA-ZÀ-ÿ])$[a-zA-ZÀ-ÿ])", " ")