Home > Net >  Replace second or more instances of duplicates with null
Replace second or more instances of duplicates with null

Time:07-16

I have the following data with duplicates which I wish to identify. I do not wish to remove these so unique value only won't work. I want to be able to identify them but just saying null.

enter image description here

I have attempted to self-reference the code but end up just duplicating the original result.

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type"),
    #"Merged Queries" = Table.NestedJoin(Source, {"Column1"}, #"Removed Duplicates", {"Column1"}, "Removed Duplicates", JoinKind.LeftOuter)
in
    #"Merged Queries"

CodePudding user response:

You can use List.Generate to generate a list with your requirements. And then you can either replace the first column or add the list as a second column.

This needs to be done in the Advanced Editor.

Please note there is a difference between the text string "null" and the power query null value. Based on your example screenshot, I assumed you wanted the "null" text string. If you prefer the null value, remove the surrounding quotes in the code

M Code

let

//Change next line to reflect your actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table13"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),

//change 2nd and later duplicates to null
    dupsNull = List.Generate(
        ()=>[v=#"Changed Type"[Column1]{0}, idx=0],
        each [idx]<Table.RowCount(#"Changed Type"),
        each [v=if List.PositionOf(#"Changed Type"[Column1],#"Changed Type"[Column1]{[idx] 1},Occurrence.First) = [idx] 1
                    then #"Changed Type"[Column1]{[idx] 1} else "null", idx=[idx] 1],
        each [v]),

//either add as a column or replace the first column
    #"add de-duped" = Table.FromColumns(
            Table.ToColumns(#"Changed Type") & {dupsNull},
            type table[Column1=text, Column2=text])
in
    #"add de-duped"

enter image description here

CodePudding user response:

Here's another way. First, add an index column. Then add another column using List.PositionOf to get the row of the first occurrence of each value in the column. Then add one last column to compare the index and List.PositionOf, to determine which row entries should be null.

Let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.PositionOf(#"Added Index"[Column1],[Column1])),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each if [Index] = [Custom] then [Column1] else null)
in
    #"Added Custom1"
  • Related