I extract data automatically from printed tables into Excel, and generally the solution I use is pretty good. However, because the tables are very complex, it sometimes merges several rows into one due to failed layout recognition. So, it might look like this (| indicates new cell):
Label1|A1|B1|C1
Label2|A2|B2|C2
Label3^Label4|A3^A4|B3^B4|C3^C4
Label5|A5|B5|C5
of course, I need 5 rows:
Label1|A1|B1|C1
Label2|A2|B2|C2
Label3|A3|B3|C3
Label4|A4|B4|C4
Label5|A5|B5|C5
Now, I could do it manually, but: multiply that with tens of thousands of cases... I'm in dire need of an automatable solution - any suggestions?
CodePudding user response:
Some assumptions from your data:
- No duplicates in column 1
- When rows are grouped by this data extractor, each column has the same number of grouped rows, and they are always separated by a
^
- There are no
^
's other than those used to separate these grouped rows.
You can obtain your desired output using Power Query
, available in Windows Excel 2010 and Office 365 Excel
- Select some cell in your original table
Data => Get&Transform => From Table/Range
orFrom within sheet
- When the PQ UI opens, navigate to
Home => Advanced Editor
- Make note of the Table Name in Line 2 of the code.
- Replace the existing code with the M-Code below
- Change the table name in line 2 of the pasted code to your "real" table name
- Examine any comments, and also the
Applied Steps
window, to better understand the algorithm and steps
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}),
//group by "Column 1"
// This assumes there are no duplicates in Column 1, as shown in your example
#"Grouped Rows" = Table.Group(#"Changed Type", {"Column1"}, {
{"splitRows",(t)=>
let
colNames = Table.ColumnNames(t),
//generate lists of new rows, splitting on the ^
newRows = List.Generate(
()=>[nr=Text.Split(Table.Column(t, colNames{0}){0}, "^"), idx = 0],
each [idx]<List.Count(colNames),
each [nr=Text.Split(Table.Column(t, colNames{[idx] 1}){0}, "^"), idx = [idx] 1],
each [nr]),
newTable = Table.FromColumns(newRows,colNames)
in
newTable}
}),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Column1"}),
#"Expanded splitRows" = Table.ExpandTableColumn(#"Removed Columns", "splitRows", {"Column1", "Column2", "Column3", "Column4"}, {"Column1", "Column2", "Column3", "Column4"})
in
#"Expanded splitRows"