Home > OS >  Extracting text between Number and Parantheses in a String field in Excel
Extracting text between Number and Parantheses in a String field in Excel

Time:11-20

I have a bunch of values like the one below in a text field in Excel and was wondering if anyone knows a method of extracting the Suburb name (i.e. Liverpool) which is usually before the first parantheses and after the last number i.e. postcode (3860 in this case)

PARK RD INT OF QUEENS DR LONDON 3860 LIVERPOOL (AGA) VIC

Thanks in advance

CodePudding user response:

Put your complex value into A1 and try this in B1:

=IF(ISERROR(FIND(" ";LEFT(A1;FIND(" (";A1)-1)));LEFT(A1;FIND(" (";A1)-1);RIGHT(LEFT(A1;FIND(" (";A1)-1);LEN(LEFT(A1;FIND(" (";A1)-1))-FIND("~";SUBSTITUTE(LEFT(A1;FIND(" (";A1)-1);" ";"~";LEN(LEFT(A1;FIND(" (";A1)-1))-LEN(SUBSTITUTE(LEFT(A1;FIND(" (";A1)-1);" ";""))))))

This will give you the word before (.
In your example it will return Liverpool. But fyi: this will work for single words.

CodePudding user response:

Try below formula-

=TRIM(RIGHT(SUBSTITUTE(TRIM(LEFT(SUBSTITUTE(A1,"(",REPT(" ",300)),300))," ",REPT(" ",100)),100))

You may also try-

=FILTERXML("<t><s>"&SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(A1,"(","</s><s>")&"</s></t>","//s[1]")," ","</s><s>")&"</s></t>","//s[last()]")

enter image description here

CodePudding user response:

Office 365:

=LET(a,1 MATCH(1,0/ISNUMBER(0 MID(A1,SEQUENCE(LEN(A1)),1))),b,FIND("(",A1),TRIM(MID(A1,a,b-a)))

CodePudding user response:

This type of thing is best done in steps, to make it easier to understand and modify. That is:

  • A1: Text you want to extract from
  • B1: Position of first character after the last digit and space =IFERROR(MATCH(10^6, INDEX(--MID(A1, ROW( INDIRECT("1:"& LEN(A1))), 1), )), 0) 2
  • C1: Position of opening parenthesis =FIND("(",A1)
  • D1: Extract text =MID(A1,C1,D1-C1-1)

This works with multi-word text.

CodePudding user response:

Tried this and worked however there is a space at the beginning of every output but I feel that is still manageable

=IFERROR(LEFT(RIGHT(J13,LEN(J13)-MAX(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},J13,ROW(INDIRECT("1:"&LEN(J13)))),0))),MAX(FIND("(",RIGHT(J13,LEN(J13)-MAX(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},J13,ROW(INDIRECT("1:"&LEN(J13)))),0))))-1)),RIGHT(J13,LEN(J13)-MAX(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},J13,ROW(INDIRECT("1:"&LEN(J13)))),0))))

CodePudding user response:

Since you have Excel365, you could use:

enter image description here

Formula in B1:

=TEXTJOIN(" ",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[position()<count(//s[starts-with(., '(')][1]/preceding::*) 1][position()>count(//s[.*0=0][last()]/preceding::*) 1]"))

The Xpath expression used means:

  • //s[position()<count(//s[starts-with(., '(')][1]/preceding::*) 1] - Get all nodes that are before the very first occurrence of an opening paranthesis.
  • [position()>count(//s[.*0=0][last()]/preceding::*) 1] - Of those returned nodes make sure the position is after the very last numeric node.

This means you'd only return those nodes that are in between the last numeric substring and the first opening paranthesis as per your request.

I'm still looking for a more sound way of writing this Xpath.

  • Related