JOB_ID | DAY_ID |
---|---|
430 | 120 |
430 | 720 |
530 | 120 |
530 | 620 |
630 | 120 |
630 | 120 |
630 | 620 |
630 | 620 |
Should become:
JOB_ID | DAY_ID |
---|---|
430 | 720 |
530 | 620 |
630 | 620 |
630 | 620 |
Where only the rows where day_id is the maximum value is kept.
I'm currently using Excel's Power Query, so Python and R scripts are not available.
CodePudding user response:
To keep the duplicates at the end, you have to fiddle around with the M. Here you go.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjE2UFDSUTI0MlCK1YFzzaFcU1RZKNcMyjVDlcXONcPCjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"JOB_ID " = _t, DAY_ID = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"JOB_ID ", Int64.Type}, {"DAY_ID", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"JOB_ID "}, {{"All", each
let a = List.Max(_[DAY_ID]),
b = List.Select(_[DAY_ID], each _ = a)
in b
, type list}}),
#"Expanded All" = Table.ExpandListColumn(#"Grouped Rows", "All")
in
#"Expanded All"
CodePudding user response:
You may want to group by Job_ID and choose to aggregate max Day_ID
edit: (see below comment) matching against original list allows to keep duplicates of Job_ID with same max Day_ID