I'm trying to normalise some data that is supplied in Excel. The data is made up of a number of dimension columns followed by several measure columns over time. Unfortunately the data comes in with a single "Measure/Year" identifier which means that if there are 10 years of data and 4 measures, there will be 40 measure columns.
I can't select specific columns to unpivot as the number of columns will change over time and I want to automate this completely.
A simplified sample of data looks like this (just showing 2 measures over 3 years in this example - but potentially 5 measures over an ever increasing number of years).
Country | Category | Product | QTY_2018 | QTY_2019 | QTY_2020 | Value_2018 | Value_2019 | Value_2020 |
---|---|---|---|---|---|---|---|---|
France | Fruit | Apple | 10 | 20 | 30 | 11 | 22 | 33 |
France | Fruit | Orange | 40 | 50 | 60 | 44 | 55 | 66 |
Germany | Veg | Carrot | 70 | 80 | 90 | 77 | 88 | 99 |
What I would like to achieve is...
Country | Category | Product | Year | QTY | Value |
---|---|---|---|---|---|
France | Fruit | Apple | 2018 | 10 | 11 |
France | Fruit | Apple | 2019 | 20 | 22 |
France | Fruit | Apple | 2020 | 30 | 33 |
France | Fruit | Orange | 2018 | 40 | 44 |
France | Fruit | Orange | 2019 | 50 | 55 |
France | Fruit | Orange | 2020 | 60 | 66 |
Germany | Veg | Carrot | 2018 | 70 | 77 |
Germany | Veg | Carrot | 2019 | 80 | 88 |
Germany | Veg | Carrot | 2020 | 90 | 99 |
So far I have selected all the non-measure columns and then applied a transform "Unpivot other columns", and then creating 2 custom columns to get the measure name (Qty or Value in this example) and the year. This gets around the problem of the varying number of measure columns but that only gets me so far.
I now have data that looks like this
Country | Category | Product | Year | Measure | Amount |
---|---|---|---|---|---|
France | Fruit | Apple | 2018 | QTY | 10 |
France | Fruit | Apple | 2018 | Value | 11 |
and so on...
Notes:
- The measure label column will always 'measurename_YYYY'
- The list of measure names is finite (4 or 5 maybe) so updating this to support more measure names if any are added will be fine as this will be rare. The number of years will increase each year but as I want end users to be able to refresh the query based on the contents of a sheet they update (the sample data above) then the varying periods must be handled in the query.
If this can be done in the datamodel I'm happy to go with that too.
I maybe going about this the wrong way with my attempts so far but my Power Query knowledge is pretty basic so any help would be gratefully received.
CodePudding user response:
You should be able to just repivot on the new Measure column to get your desired result now.
CodePudding user response:
You're nearly there. Just Pivot on your "Measure" column, to complete the output:
Unpivoted = Table.UnpivotOtherColumns(Source, {"Country", "Category", "Product"}, "Attribute", "Value"),
#"Split Column" = Table.SplitColumn(Unpivoted, "Attribute", Splitter.SplitTextByEachDelimiter({"_"}, QuoteStyle.Csv, false), {"Measure", "Year"}),
#"Pivoted Column" = Table.Pivot(#"Split Column", List.Distinct(#"Split Column"[Measure]), "Measure", "Value")