Home > Blockchain >  Extract first integers/decimals from units/string using regex in excel
Extract first integers/decimals from units/string using regex in excel

Time:05-13

I wish to separate the following data using regex function as follows:

enter image description here

Function to be used:

let   fx=(text,regex)=>
    Web.Page(
        "<script>
            var x='"&text&"';
            var y=new RegExp('"&regex&"','g');
            
            var b=x.match(y);
            document.write(b);
        </script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}

in
fx

Agurments:

  • text - Column1
  • List item

regex - \\d \\.?\\d

This successfully extracts the numerical values, however:

  1. I am unsure if this is the correct regex to remove the first integer/number.
  2. I am unsure how to use regex to extract only the units. It seems to run into errors despite various attempts. e.g. enter image description here

    data:

    1200 mg/kg bw/day
    24 mg/kg/day
    0.79 mg/kg bw/day
    15 ng/m3
    15 ng/m 3
    Not Limited
     30mg/m³
    

    CodePudding user response:

    There is no issue with the function itself, the problem lays with the patterns used:


    Numeric part: You currently use \\d \\.?\\d which basically means; "Any 1 digit followed by an optional dot and at least another 1 digits". So, at least two digits anywhere in a string. The appropriate regex would be:

    ^\\d (?:\\.\\d )?
    

    Meaning:

    • ^ - Start-line anchor;
    • \\d - 1 (Greedy) digits;
    • (?:\\.\\d )? - Optional non-capture group to match a literal dot followed by at least 1 more digits.

    Units: Your pattern [\D ][2] matches a single character from the 1st class that is either a non-digit or a literal plus sign. Your 2nd character class matches a literal 2. So you are looking for patterns like 'A2' or ' 2' etc. The appropriate regex would be dependend on your input.


    Proposal:

    In a enter image description here

    let
        Source = Excel.CurrentWorkbook(){[Name="Tabel1"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
        #"Invoked Custom Function1" = Table.AddColumn(#"Changed Type", "Nr", each fnRegexExtr([Column1], "^(\\d (?:\\.\\d )?)?\\s*(. )$", "$1")),
        #"Invoked Custom Function2" = Table.AddColumn(#"Invoked Custom Function1", "Unit", each fnRegexExtr([Column1], "^(\\d (?:\\.\\d )?)?\\s*(. )$", "$2")),
        #"Replaced Errors" = Table.ReplaceErrorValues(#"Invoked Custom Function2", {{"Nr", null}, {"Unit", null}})
    in
        #"Replaced Errors"
    

    A 2nd option would be to replace the value with a delimiter where you later split on:

    let
        Source = Excel.CurrentWorkbook(){[Name="Tabel1"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
        #"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "fnRegexExtr", each fnRegexExtr([Column1], "^(\\d (?:\\.\\d )?)?\\s*(. )$", "$1|$2")),
        #"Split Column by Delimiter" = Table.SplitColumn(#"Invoked Custom Function", "fnRegexExtr", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Nr", "Unit"})
    in
        #"Split Column by Delimiter"
    

    CodePudding user response:

    Thanks for posting. Looks like you need to match characters up the first space. For that you can use the following regex expression.

    ^[^\s] 
    
  • Related