Home > Software design >  repeated header rows in a long CSV file - Can it be processed?
repeated header rows in a long CSV file - Can it be processed?

Time:12-22

I have a CSV file which has a list of data for every office in the company. The problem is the 'Office' name is the header. Is there a way to pick up the change of office and have that in another column? For example, source data as below:

New York
11/01/2022;09:44:29;Apples;10;$33
11/01/2022;09:45:21;Bananas;13;$21
12/01/2022;13:45:42;Carrots;12;$2
14/01/2022;10:12:19;Apples;11;$36

London
11/01/2022;11:24:19;Apples;5;$12
11/01/2022;12:40:61;Bananas;15;$17
13/01/2022;16:11:22;Carrots;1;$1
15/01/2022;09:12:11;Lettuce;14;$31
16/01/2022;09:56:31;Bananas;20;$81

Paris
11/01/2022;10:23:54;Apples;7;$2
11/01/2022;12:40:61;Bananas;56;$117
13/01/2022;13:22:22;Carrots;10;$10
14/01/2022;15:09:19;Beans;87;$331
16/01/2022;15:36:22;Apples;2;$8

etc

Required output from Power Query would be more like this. One continuous list with the office names as the first column for each row

New York 11/01/2022;09:44:29;Apples;10;$33
New York 11/01/2022;09:45:21;Bananas;13;$21
New York 12/01/2022;13:45:42;Carrots;12;$2
New York 14/01/2022;10:12:19;Apples;11;$36
London 11/01/2022;11:24:19;Apples;5;$12
London 11/01/2022;12:40:61;Bananas;15;$17
London 13/01/2022;16:11:22;Carrots;1;$1
London 15/01/2022;09:12:11;Lettuce;14;$31
London 16/01/2022;09:56:31;Bananas;20;$81
Paris 11/01/2022;10:23:54;Apples;7;$2
Paris 11/01/2022;12:40:61;Bananas;56;$117
Paris 13/01/2022;13:22:22;Carrots;10;$10
Paris 14/01/2022;15:09:19;Beans;87;$331
Paris 16/01/2022;15:36:22;Apples;2;$8

CodePudding user response:

Import the csv file in powerquery

enter image description here

Add column, custom column, with formula

= if [Column2]=null then [Column1] else null

if that returns all nulls, try

= if [Column2]="" then [Column1] else null

right click and duplicate that column, then right click the duplicate and fill down

Go back to the column you duplicated . Use the arrows atop the column to uncheck and filter out the non-nulls

Right click and remove extra columns. Re-arrange columns as needed

let Source = Csv.Document(File.Contents("C:\Temp\a.csv"),[Delimiter=";", Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each if [Column2]="" then [Column1] else null),
#"Duplicated Column" = Table.DuplicateColumn(#"Added Custom", "Custom", "Custom - Copy"),
#"Filled Down" = Table.FillDown(#"Duplicated Column",{"Custom - Copy"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Custom] = null)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Custom"})
in  #"Removed Columns"

CodePudding user response:

A very easy and fast way is to use the filter button on Column2 in the Power Query editor and filter out nulls in that column, just like you would apply a filter in a normal table. That will remove the major heading rows, and if you look in the formula bar for that step, you will see the M code query which will be run each time you update the query for new data.

= Table.SelectRows(#"Filled Down", each ([Column2] <> null))

In plain language, that code will delete all rows when Column2 contains nulls.

  • Related