Home > Mobile >  Replacing all non-alphanumeric punctuation characters with empty strings
Replacing all non-alphanumeric punctuation characters with empty strings

Time:12-24

I'm working on a regular expression in Talend inside a tReplace component

I'm moving data from Oracle to Redshift and I'm having issues with DDL length because some characters are not supported (I guess)

I have product names like

175/65 R14 Efficiency

XXX N° 5 H7DC

And they have to stay like this. But sometimes I have NBSP inside labels or even worse sometimes

I saw this list of punctuation online [!"#$%&'()* ,-./:;<=>?@[\]^_{|}~°]

and I need to add it to my already existent Regex "[^A-Za-z0-9]"

TLDR; Can someone help me writing a REGEX to replace everything in a column except [A-Za-z0-9] and the punctuation list above ? It must be able to be use in the following code (As I'm using Talend and it's java interpreted)

StringUtils.replaceAll(row1.label, "[^A-Za-z0-9]", "");

CodePudding user response:

I ended up finding the solution thanks to the help of the answers above.

I used :

[^\p{Alnum}\p{Punct}\s]

  • Related