Home > Software engineering >  Extract a number from text in a PowerBI field and create a new calculated column
Extract a number from text in a PowerBI field and create a new calculated column

Time:11-28

I have the following data in Excel, which I'm importing into PowerBI.

Original Spreadshet

In the short description, there is a code (immediately after IDN) in each row - I need to extract just the number. THe number is not always the same length and it may be followed by a space, or another character (a - in the screenshot).

In excel I can use: =SEARCH("IDN",A2) to find the start of the IDN text - FirstDetectIDN

I can then find the next space (NextSpace) using find again: =FIND(" ",A2,B2)

I use the same to find the NextSpace2 - so I now have the starting and end position of the spaces surrounding the number I want to extract.

But that gives me the extra characters on the end of the number ("-EOL" above in the screenshot) that I don't want.

Is there any way in PowerBI that I can replicate all of that in one new calculated column AND also only extract the number part (so for the second line, I would only want 784729 in the new calculated field).

Thanks for any suggestions,

Mark

CodePudding user response:

This type of data cleaning should be done in Power Query.

Add a new column and type in the following code:

let 
a = Text.AfterDelimiter([Column1],"IDN"),
b = List.Transform({a}, each Text.Select(_, {"0".."9"}))
in b{0}

enter image description here

Full code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCk4tUzBV8HTxU7A0MbcwVtBVcCxNyfTITM9QitWJVvJNLUksSszNL80rASsytzAxN7LUdfX3AaoMT03JTU1Rio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each let 
        a = Text.AfterDelimiter([Column1],"IDN"),
        b = List.Transform({a}, each Text.Select(_, {"0".."9"}))
        in b{0})
in
    #"Added Custom"

CodePudding user response:

If you have digits AFTER the set of digits comprising the IDN number, then try this slightly more complicated version:

  • First split on IDN
  • Then split on the transition from digit to non-digit
  • Note that by specifying the Added Column as type text, we will retain any leading zeros in the IDN. If you prefer, you can specify as type number or Int64.Type which will drop any leading zeros.
let

//Change next line to reflect actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],

//Set data type
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Short description", type text}}),

//Extract the first set of digits after "IDN"
    #"Added Custom" = Table.AddColumn(#"Changed Type", "IDN", each 
        Text.Trim(
            Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c))
                (Text.AfterDelimiter([Short description],"IDN")){0}), type text)
in
    #"Added Custom"

enter image description here

  • Related