I wish to separate the following data using regex function as follows:
Function to be used:
let fx=(text,regex)=>
Web.Page(
"<script>
var x='"&text&"';
var y=new RegExp('"®ex&"','g');
var b=x.match(y);
document.write(b);
</script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
fx
Agurments:
- text - Column1
- List item
regex - \\d \\.?\\d
This successfully extracts the numerical values, however:
- I am unsure if this is the correct regex to remove the first integer/number.
- I am unsure how to use regex to extract only the units. It seems to run into errors despite various attempts. e.g.
data:
1200 mg/kg bw/day 24 mg/kg/day 0.79 mg/kg bw/day 15 ng/m3 15 ng/m 3 Not Limited 30mg/m³
CodePudding user response:
There is no issue with the function itself, the problem lays with the patterns used:
Numeric part: You currently use
\\d \\.?\\d
which basically means; "Any 1 digit followed by an optional dot and at least another 1 digits". So, at least two digits anywhere in a string. The appropriate regex would be:^\\d (?:\\.\\d )?
Meaning:
^
- Start-line anchor;\\d
- 1 (Greedy) digits;(?:\\.\\d )?
- Optional non-capture group to match a literal dot followed by at least 1 more digits.
Units: Your pattern
[\D ][2]
matches a single character from the 1st class that is either a non-digit or a literal plus sign. Your 2nd character class matches a literal 2. So you are looking for patterns like 'A2' or ' 2' etc. The appropriate regex would be dependend on your input.
Proposal:
let Source = Excel.CurrentWorkbook(){[Name="Tabel1"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}), #"Invoked Custom Function1" = Table.AddColumn(#"Changed Type", "Nr", each fnRegexExtr([Column1], "^(\\d (?:\\.\\d )?)?\\s*(. )$", "$1")), #"Invoked Custom Function2" = Table.AddColumn(#"Invoked Custom Function1", "Unit", each fnRegexExtr([Column1], "^(\\d (?:\\.\\d )?)?\\s*(. )$", "$2")), #"Replaced Errors" = Table.ReplaceErrorValues(#"Invoked Custom Function2", {{"Nr", null}, {"Unit", null}}) in #"Replaced Errors"
A 2nd option would be to replace the value with a delimiter where you later split on:
let Source = Excel.CurrentWorkbook(){[Name="Tabel1"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}), #"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "fnRegexExtr", each fnRegexExtr([Column1], "^(\\d (?:\\.\\d )?)?\\s*(. )$", "$1|$2")), #"Split Column by Delimiter" = Table.SplitColumn(#"Invoked Custom Function", "fnRegexExtr", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Nr", "Unit"}) in #"Split Column by Delimiter"
CodePudding user response:
Thanks for posting. Looks like you need to match characters up the first space. For that you can use the following regex expression.
^[^\s]