Home > Mobile >  Extract many lines of text in a sheet cell using regex
Extract many lines of text in a sheet cell using regex

Time:09-05

I need to extract the lines of text without header and upto the line ,,,,,Total

INPUT

Sr.No ,Description of Goods ,HSN ,,City ,Unit ,,Unit Price ,Amount (in BP) 
1 ,29550762R3C0-Top- ST22E360 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 
2 ,29550762R3C0-Bot- ST22E361 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 
3 ,2945486609-G2 - Bot Stencil - ST22F199 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 
4 ,2945457600-S1 - Top Stencil - ST22F224 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 
5 ,2945457600-G2 - Bot Stencil - ST22F225 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 
,,,,,Total ,,,25000.00 IN 
,,,,,Discount ,,,0.00 3 
,,,,Total Amount ,Before Tax ,,,"₹ 25,000.00 "
,,,,,IGST 0% ,,,₹ 0.00 
,,,,Total Amount ,After Tax: ,,,"₹ 25,000.00 "

EXPECTED OUTPUT

1 ,29550762R3C0-Top- ST22E360 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 
2 ,29550762R3C0-Bot- ST22E361 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 
3 ,2945486609-G2 - Bot Stencil - ST22F199 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 
4 ,2945457600-S1 - Top Stencil - ST22F224 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 
5 ,2945457600-G2 - Bot Stencil - ST22F225 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00 

I have tried with: =REGEXEXTRACT(A10,"(\n.*)"), The number of lines are not known in advance and can vary from cell to cell. Some may have one and some have 5 to 6.

CodePudding user response:

With your shown samples Only, please try following solutions. Here is the enter image description here

Formula in A3:

=QUERY(FLATTEN(SPLIT(A1,CHAR(10))),"where Col1 matches '^\d. '")

CodePudding user response:

I would simply match the lines which start with digits,space,comma till the end of line:

/^\d  *,.*$/gm

Demo on Regex101.com

^    Start of line
\d   One or more digits
 *   Zero or more spaces
,    Comma
.*   Any character
$    End of line

/gm  Flags: Global; Multi line
  • Related