Home > OS >  Regex flexible remove dash between characters only
Regex flexible remove dash between characters only

Time:12-15

I need to remove the dashes between characters only but the way I'm doing it I need to run two separate replaceAll commands - is there a way to do it with just one? It must allow for French characters.

.replaceAll("([a-zA-ZÀ-ÿ])-([a-zA-ZÀ-ÿ])", "$1 $2")

The desired outcome is from this:

Pourquoi s'intéresse-t-elle à lui.
Ma chère je comptais là-dessus - figurez-vous.
Il y a 1-2 chemins que nous pouvons choisir.

to this:

Pourquoi s'intéresse t elle à lui
Ma chère je comptais là dessus - figurez vous.
Il y a 1-2 chemins que nous pouvons choisir.

With the above pattern I can only fix the second sentence or with an adjustment, the first sentence but I want to fix them both with one pattern..

Regex to match the first sentence:

https://regex101.com/r/fHQNYP/1

and to match the second sentence:

https://regex101.com/r/j4YGrj/1

CodePudding user response:

You can't match hyphens on both ends of a t word because the trailing ([a-zA-ZÀ-ÿ]) in the regex consumes the t, and there is no way for the regex to match it again during the next iteration. Have a look:

Pourquoi s'intéresse-t-elle à lui.
                    ^^^
                      First match (replace with " t ")

Pourquoi s'intéresse-t-elle à lui.
                       ^
                        -- search goes on from here, no more matches!

You want to only match hyphens between letters, so use

.replaceAll("(?<=\\p{L})-(?=\\p{L})", " ")

See the regex demo. Details:

  • (?<=\p{L}) - a positive lookbehind that matches a location immediately preceded with any Unicode letter
  • - - a hyphen
  • (?=\p{L}) - a positive lookahead that matches a location that is immediately followed with any Unicode letter.

CodePudding user response:

That is a tricky one. For one side, you need to specify all the accented characters. You cannot use \w because that would only include non-accented. And even though, having patterns like -t- implies that the second character has to be declared as zero width.

So, I think this can solve both problems:

"Pourquoi s'intéresse-t-elle à lui. Ma chère je comptais là-dessus - figurez-vous."
.replaceAll("([a-zA-Z\\u00C0-\\u017F])-((?=[a-zA-Z\\u00C0-\\u017F]))", "$1 $2");

CodePudding user response:

This is the regular expression to replace all dashes.

stringValue.replaceAll("[-]", " ")

This is for replacing dashes in between words.

stringValue.replaceAll("(\\b-\\b)", " ")

This is late but work for the answer:

stringValue.replaceAll("(^[a-zA-ZÀ-ÿ](?<![a-zA-ZÀ-ÿ])(?=[a-zA-ZÀ-ÿ])|(?<=[a-zA-ZÀ-ÿ])(?![a-zA-ZÀ-ÿ])-(?<![a-zA-ZÀ-ÿ])(?=[a-zA-ZÀ-ÿ])|(?<=[a-zA-ZÀ-ÿ])(?![a-zA-ZÀ-ÿ])$[a-zA-ZÀ-ÿ])", " ")
  • Related