Home > Enterprise >  Filter strings using a regular expression in DataWeave
Filter strings using a regular expression in DataWeave

Time:12-26

I try to filter an input based on regex (regular expressions) in Dataweave. I have an input payload Array as input:

[
  "description_1",
  "description_2",
  "Ruimte_1",
  "_1_1_Candybar",
  "_1_2_Groceryshop",
  "description_3",
  "Ruimte_2",
  "_2_1_house1",
  "_2_1_house2"
  "description_4",
]

When I do a normal filter without a regular expression like this then I get the right result:

payload filter ((item, index) -> ((item startsWith  "_1_") or (item startsWith  "_2_")))

This is the result:

[
  "_1_1_Candybar",
  "_1_2_Groceryshop",
  "_2_1_house1",
  "_2_1_house2"
]

But the problem is I can get any kind of numbers as input. So I tried Regex. DataWeave accepts for some functions a regex but most times not.

So I tried these alternatives:

payload filter ((item, index) -> item startsWith  (/_[0-9]_/) as Regex as String)
//return is empty 
payload filter ((item, index) -> item startsWith  (/_[0-9]_/) as Regex as String)
//return is empty
payload filter ((item, index) -> item ~=  ((/_[0-9]_/) as Regex) as String)
//return is empty
payload filter ((item, index) -> item ~= /_[0-9]_/)
//return is empty
// so I tried this escaping the _:
payload filter ((item, index) -> item ~= /\_[0-9]\_/)
// nothing is returned.

I have found something what can do the job:

payload filter ((item, index) -> item matches (/_[0-9]_[0-9]_[A-z0-9]*/))

Anybody has a better idea how to solve this ?

CodePudding user response:

The problem with most of the expressions you used is that you are using incorrectly either regular expressions, DataWeave or both.

  • startWith() expects strings parameters, not regular expressions.
  • The operator ~= is to force auto coerce in a comparison. Not sure how you were expecting to use it with regular expression.
  • as Regex as String: this doesn't make sense at all. For example /_[0-9]_/ is already a regular expression.
  • contains(): suggested in a comment, doesn't try to match the entire input. If the full pattern needs to match then it is not appropriate.

Remember that the result of the filter expression should be true or false.

The last one you use is perfectly fine if you are looking for that pattern. The matches() function returns a boolean which is what filter() needs. The regular expression for matches() must match the entire input string. You could replace [0-9] with \d but the result is exactly the same.

The actual logic or criteria that you want to use is not clear. If you want a variable number of number between the underline char you could use a group. Depending on if the next character after the numbers are expected to be letters you could use something like:

payload filter ($ matches /^(_\d) _[A-Za-z].*/)

You need first to understand the logic that you want. Then if your solution is adequately considering all cases then it is fine.

  • Related