Let’s say we have a xlsx
file, which contains hundreds of rows in below format:
name time1 theater1 time2 theater2 time3 theater3
Nomadland 09/02/2022 Theater A 09/04/2022 Theater A 09/07/2022 Theater B
I want to transpose this table from one movie with multiple screening time and screening theater in one row, into multiple rows with one screening time and screening theater in one row, which the final result will look like:
name time theater
Nomadland 09/02/2022 Theater A
Nomadland 09/04/2022 Theater A
Nomadland 09/07/2022 Theater B
Currently, I’m using openpyxl
in Python to finish this process. But I’m wondering is there a solution in vanilla Excel
way to solve this?
CodePudding user response:
This can also be accomplished using Power Query, available in Windows Excel 2010 and Excel 365 (Windows or Mac)
To use Power Query
- Select some cell in your Data Table
Data => Get&Transform => from Table/Range
orfrom within sheet
- When the PQ Editor opens:
Home => Advanced Editor
- Make note of the Table Name in Line 2
- Paste the M Code below in place of what you see
- Change the Table name in line 2 back to what was generated originally.
- Read the comments and explore the
Applied Steps
to understand the algorithm
Assumes all movies will be in a single row
You should be able to add unlimited theater/time column pairs, and movies
Basic algorithm:
- Unpivot all columns except for the
name
- group by the
name
and the numeric part of the theater and or time col labels (which are in a column after the unpivot) - extract the time and theater from each grouping
M Code
let
//change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//routine in case you have unlimited "pairs" of columns in Source data
colHeaders = Table.ColumnNames(Source),
transforms = List.Transform(List.Alternate(colHeaders,1,1,0), each {_, type date}) &
List.Transform(List.Alternate(colHeaders,1,1,1), each {_, type text}),
#"Changed Type" = Table.TransformColumnTypes(Source, transforms),
//Unpivot except for the Name column
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"name"}, "Attribute", "Value"),
//split column to get theater/time number
#"Split Column by Character Transition" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute",
Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"Attribute.1","Attribute.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Character Transition",{"Attribute.1"}),
//groub by name & theater/time number and extract the theater and time
#"Grouped Rows" = Table.Group(#"Removed Columns", {"name","Attribute.2"}, {
{"time", each [Value]{0}, type date},
{"theater" ,each [Value]{1}, type text}
}),
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Attribute.2"})
in
#"Removed Columns1"