We routinely receive .CSV files from a contractor that consist of tables exported out of Revit. From a data perspective, they are less than useful so I am looking for a way to ingest these files and generate a properly laid out table.
Layout of files received
These come to us in individual files and—as you will note—the column headings are not consistent between tables (but follow either of these two formats).
LEVEL 2 - CLG CONNECTIONS TAKE-OFF | ||
---|---|---|
ITEM | NO. | COMMENTS |
RCA225/54 | 684 |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | ||
---|---|---|
ITEM | LENGTH | NO. |
362S162-54 | 1' - 10" | 30 |
362S162-54 | 1' - 11 3/4" | 4 |
362S162-54 | 2' - 5 3/4" | 4 |
362S162-54 | 2' - 11 15/16" | 14 |
362S162-54 | 3' - 7 1/4" | 20 |
362S162-54 | 4' - 11 1/4" | 28 |
Desired Outcome
I am looking for a method to import the .csv files (ideally just dumping them sequentially into one sheet for ease and speed) and reformat them into a usable layout.
LOCATION | ITEM | LENGTH | NO. | COMMENTS |
---|---|---|---|---|
LEVEL 2 - CLG CONNECTIONS TAKE-OFF | RCA225/54 | 684 | ||
LEVEL 2 - INTERIOR WALLS CONNECTIONS TAKE-OFF | RCA223-54 | 166 | ||
LEVEL 2 - INTERIOR WALLS CONNECTIONS TAKE-OFF | Steel connector - WBAC162 | 360 | ||
LEVEL 2 - INTERIOR WALLS CONNECTIONS TAKE-OFF | TIE PLATE - TP35 | 184 | ||
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 1' - 10" | 30 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 1' - 11 3/4" | 4 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 2' - 5 3/4" | 4 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 2' - 11 15/16" | 14 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 3' - 7 1/4" | 20 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 4' - 11 1/4" | 28 |
This scenario in general is one I encounter regularly from office staff laying out spreadsheets where their focus is presentation rather than ease of data extraction. Does anyone have a good method of "converting" from one to the other?
I've been trying to utilize regexmatch
and/or query
along with counta
, split/join
, and indirect
to somehow extract a list of LOCATIONS and then count the number of rows beneath it to then extract that number of rows, but (a) I'm not sure this is a good solution and (b) I'm not getting the results I need.
CodePudding user response:
Having those two scenarios, you may be able to sort it out with some auxiliary columns (Yes, you may avoid them, but calculations and formulas would be much longer!)
First Auxiliary Column in E, replicates the Level name with SCAN:
=SCAN(,A:A,LAMBDA(a,s,IF(REGEXMATCH(s,"^(?i)Level"),s,a)))
With SCAN again you can do a second auxiliary column to find scenario 1 and 2 considering your two tables:
=SCAN(,B:B,LAMBDA(a,s,IFS(s="","",s="LENGTH",1,s="NO.",2,s=s,a)))
And you'll be able to filter columns and re-arrange them with curly brackets:
={FILTER({E:E,A:D},F:F=1,A:A<>"ITEM");
FILTER({E:E,A:A,D:D,B:C},F:F=2,A:A<>"ITEM")}
If you want you can sort it through any column:
=SORT({FILTER({E:E,A:D},F:F=1,A:A<>"ITEM");
FILTER({E:E,A:A,D:D,B:C},F:F=2,A:A<>"ITEM")},1,1)
CodePudding user response:
You may be best off by implementing this in Apps Script, but if the number of CSV files stays the same, it looks doable with a Google Sheets formula as well.
You are saying that the CSV files are imported individually. If you place each file in a Google Sheets tab of its own, such as Perimetral
and CLG
, you can use iferror()
to copy the Location name from cell A1
on each tab to every row in the result, and to put blanks in columns where data is missing on a particular tab, like this:
=arrayformula(
query(
{
"Location", Perimetral!A2:C2, "Comments";
iferror(Perimetral!A3:A/0, Perimetral!A1), Perimetral!A3:C, iferror(Perimetral!A3:A/0);
iferror(CLG!A3:A/0, CLG!A1), CLG!A3:A, iferror(CLG!A3:A/0), CLG!B3:B, CLG!C3:C
},
"where Col2 is not null", 1
)
)
The above formula requires fine-tuning to make columns fall into the right slots. To avoid that, look up column positions dynamically with filter()
, like this:
=arrayformula(
lambda(
headers, table1, table2,
query(
{
headers;
map(
headers,
lambda(
columnName,
iferror(
filter(
{ iferror(offset(table1, 0, 0, rows(table1), 1) / 0, offset(table1, -2, 0, 1, 1)), table1 },
columnName
=
{ "Location", offset(table1, -1, 0, 1, columns(table1)) }
)
)
)
);
map(
headers,
lambda(
columnName,
iferror(
filter(
{ iferror(offset(table2, 0, 0, rows(table2), 1) / 0, offset(table2, -2, 0, 1, 1)), table2 },
columnName
=
{ "Location", offset(table2, -1, 0, 1, columns(table2)) }
)
)
)
)
},
"where Col2 is not null", 1
)
)(
{ "Location", "ITEM", "LENGTH", "NO.", "Comments" }, CLG!A3:Z, Perimetral!A3:Z
)
)
This latter formula looks more complex but it should actually be pretty easy to make it work with more tables by adding tables at the end, and putting in more copies of the map()
block, as required.
The results look like this:
Location | ITEM | LENGTH | NO. | Comments |
---|---|---|---|---|
LEVEL 2 - CLG CONNECTIONS TAKE-OFF | RCA225/54 | 684 | COMMENTS | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 1' - 10" | 30 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 1' - 11 3/4" | 4 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 2' - 5 3/4" | 4 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 2' - 11 15/16" | 14 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 3' - 7 1/4" | 20 | |
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF | 362S162-54 | 4' - 11 1/4" | 28 |