I have a huge excel file that I'm trying to sort by extracting the last value (column D) collected at the end of each day (column B). These data are coming from a rain gauge. Values in column D always increases during the day (but can also stay at 0) and goes back to 0 after midnight. Data look like this:
nb | date | mm | avg |
---|---|---|---|
4669 | 23:39 23/9/2020 | 0.0 | 2.3 |
4670 | 23:44 23/9/2020 | 0.8 | 2.6 |
4671 | 23:49 23/9/2020 | 1.6 | 2.9 |
4672 | 23:54 23/9/2020 | 0.8 | 2.9 |
4673 | 23:59 23/9/2020 | 0.0 | 2.9 |
4674 | 0:04 24/9/2020 | 0.0 | 0.0 |
4675 | 0:09 24/9/2020 | 0.0 | 0.0 |
4769 | 23:39 24/9/2020 | 0.0 | 12.3 |
4770 | 23:44 24/9/2020 | 1.8 | 12.6 |
4771 | 23:49 24/9/2020 | 3.6 | 12.9 |
4772 | 23:54 24/9/2020 | 1.8 | 12.9 |
4773 | 23:59 24/9/2020 | 0.0 | 12.9 |
4774 | 0:04 25/9/2020 | 0.0 | 0.0 |
4775 | 0:09 25/9/2020 | 0.0 | 0.0 |
How can I do to simplify my table with only 1 value per day (last or maximum)?
Thank you
CodePudding user response:
I would put this through PowerQuery.
- Format as table
- Data > From Table/Range
- Change formatting on Column B from Date/Time to Date assuming this is recognised as a Date/Time (Home > Data type: Date)
- Order Column A, highest to lowest (Home > Sort > Z to A)
- Add an index column (Add Column > Index Column)
- Remove duplicates on Column B Date (Home > Remove Rows > Remove Duplicates)
- Delete the index column (Home > Remove Columns > Remove Column)
You now have one value, the latest update, for each date.
The reason for the index column is because if you don't do this, the data will not remove duplicates based on the order. This forces the cache to update and store the new order of highest to lowest in step 4.
e: data posted instead of picture so here is the actual PowerQuery of it;
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"nb", Int64.Type}, {"date", type date}, {"mm", type number}, {"avg", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"nb", Order.Descending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
#"Removed Duplicates" = Table.Distinct(#"Added Index", {"date"}),
#"Removed Columns" = Table.RemoveColumns(#"Removed Duplicates",{"Index"})
in
#"Removed Columns"
To use this;
- Format as table
- Data > From Table/Range
- Home > Advanced Editor and paste in my code however,
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
this line will change depending on what table this data is. So effectively just keep your line of this, not mine.
If this does not work due to the date not being a date, try the below code as it splits by space as a delimiter and then assigns the column as a date format;
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Split Column by Delimiter" = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"date", type text}}, "en-GB"), "date", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"date.1", "date.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"date.1", "date"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"date", type date}, {"nb", Int64.Type}, {"mm", type number}, {"avg", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"nb", Order.Descending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
#"Removed Duplicates" = Table.Distinct(#"Added Index", {"date"}),
#"Removed Columns" = Table.RemoveColumns(#"Removed Duplicates",{"Index", "date.2"})
in
#"Removed Columns"
CodePudding user response:
I concur that PowerQuery could be a very nice go-to solution for you. But it's also possible with formulas, for example:
Formula in F2
:
=FILTER(A2:D15,B2:B15=XLOOKUP(INT(B2:B15),INT(B2:B15),B2:B15,,0,-1))