Home > Net >  Preparing a csv file splitting data in one cell into multiple lines for hundreds of records
Preparing a csv file splitting data in one cell into multiple lines for hundreds of records

Time:03-03

I have a .csv data extract from a system that is setup in the following format, the extract cannot be changed but I am wanting to transform the data before loading it into either a Google Sheet, MS Access DB or into a SQL Server DB. It is likely Google Sheets will be used as a proof of concept with the production element being in an SQL Server DB.

id 07/02/2022 08/02/2022 09/02/2022 10/02/2022 11/02/2022
1 / \ / \ M \ / \ / \
2 / \ / \ M \ / \ / \
3 I02 I02 / \ I02 I02 U \ U \
4 I01 M M M / \ / \ / \
5 / \ / \ / \ / \ / \

The first mark in each column is the morning mark and the last mark is the PM mark. i.e. for record 3 on the 04/02/2022 the AM mark is U and the PM mark is \

The file will have in excess of 300 lines each week so should produce a final file with around 1500 lines (300 lines x 5 days a week)

Each results cell will always have two values recorded in each cell seperated by a space i.e. /[space]\ or M[space]\ The data will be extracted on a weekly basis with the column headers having dates that link to a day of the week, Monday through Friday.

For data analysis further down the pipeline I need to split the data so I end up with something like the below (replicated for the first two rows of my master data only):

Primary ID id Date Period Mark
1 1 07/02/2022 AM /
2 1 07/02/2022 PM |
3 1 08/02/2022 AM /
4 1 08/02/2022 PM |
5 1 09/02/2022 AM M
6 1 09/02/2022 PM |
7 1 10/02/2022 AM /
8 1 10/02/2022 PM |
9 1 11/02/2022 AM /
10 1 11/02/2022 PM |
11 2 07/02/2022 AM /
12 2 07/02/2022 PM |
13 2 08/02/2022 AM /
14 2 08/02/2022 PM |
15 2 09/02/2022 AM M
16 2 09/02/2022 PM |
17 2 10/02/2022 AM /
18 2 10/02/2022 PM |
19 2 11/02/2022 AM /
20 2 11/02/2022 PM |

I've used an array formula using query, split and flatten as explained in this topic to produce the following output Google Sheets - Transpose & Query

using this formula:

=ARRAYFORMULA(QUERY({"Student", "Date","Mark"; IFERROR(SPLIT(FLATTEN(IF(Master!A2:T="",,Master!A2:A&"♦"&Master!I1:M1&"♦"&Master!I2:M)), "♦"))}, "where Col1 > 0", 1))
id Date Mark
1 07/02/2022 /|
1 08/02/2022 /|
1 09/02/2022 M|

The problem I am having is trying to create the duplicate line for the AM/PM element. It dosetn exist as an element on its own, and can oly be attributed by the location of the text element in a cell (first element AM, second element PM)

CodePudding user response:

Here is a shared sheet with a possible solution to your problem.

It uses the SPLIT()/FLATTEN() technique I pioneered ( :) ) but you need a nested version to make it work with your layout.

Here is the formula in it's entirety, but this sample is an easier place from which to understand it and will stay up indefinitely. It should not be important to do it all with one formula (including headers) since this is a proof of concept.

=ARRAYFORMULA(QUERY(IFERROR(SPLIT(FLATTEN(FLATTEN(OP.Data!A2:A100&"|"&OP.Data!B1:1&"|")&{"AM|","PM|"}&SPLIT(FLATTEN(OP.Data!B2:100)," ")),"|",0,0)),"where Col1 is not null and Col2 is not null"))
  • Related