I have the following data in Excel, which I'm importing into PowerBI.
In the short description, there is a code (immediately after IDN) in each row - I need to extract just the number. THe number is not always the same length and it may be followed by a space, or another character (a - in the screenshot).
In excel I can use: =SEARCH("IDN",A2) to find the start of the IDN text - FirstDetectIDN
I can then find the next space (NextSpace) using find again: =FIND(" ",A2,B2)
I use the same to find the NextSpace2 - so I now have the starting and end position of the spaces surrounding the number I want to extract.
But that gives me the extra characters on the end of the number ("-EOL" above in the screenshot) that I don't want.
Is there any way in PowerBI that I can replicate all of that in one new calculated column AND also only extract the number part (so for the second line, I would only want 784729 in the new calculated field).
Thanks for any suggestions,
Mark
CodePudding user response:
This type of data cleaning should be done in Power Query.
Add a new column and type in the following code:
let
a = Text.AfterDelimiter([Column1],"IDN"),
b = List.Transform({a}, each Text.Select(_, {"0".."9"}))
in b{0}
Full code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCk4tUzBV8HTxU7A0MbcwVtBVcCxNyfTITM9QitWJVvJNLUksSszNL80rASsytzAxN7LUdfX3AaoMT03JTU1Rio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each let
a = Text.AfterDelimiter([Column1],"IDN"),
b = List.Transform({a}, each Text.Select(_, {"0".."9"}))
in b{0})
in
#"Added Custom"
CodePudding user response:
If you have digits AFTER the set of digits comprising the IDN number, then try this slightly more complicated version:
- First split on
IDN
- Then split on the transition from digit to non-digit
- Note that by specifying the Added Column as
type text
, we will retain any leading zeros in theIDN
. If you prefer, you can specify astype number
orInt64.Type
which will drop any leading zeros.
let
//Change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
//Set data type
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Short description", type text}}),
//Extract the first set of digits after "IDN"
#"Added Custom" = Table.AddColumn(#"Changed Type", "IDN", each
Text.Trim(
Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c))
(Text.AfterDelimiter([Short description],"IDN")){0}), type text)
in
#"Added Custom"