Home > Mobile >  Regex: select until first space or comma occurrence
Regex: select until first space or comma occurrence

Time:06-13

I have following example of american addresses.

6301 Stonewood Dr Apt-728, Plano TX-75024 
13323 Maham Road, Apt # 1621, Dallas, TX 75240
17040 Carlson Drive, #1027 Parker, CO 80134
3465 25th St., San Francisco, CA 94110 

I want to extract city from using regex

Plano, Dallas, Parker, San Francisco

I am using following regex which is working for first example

(?<=[,.|•]).*\s (?=[\s,.]?CA?[\s,.-]?[\d]{4,})

can you help me for the same as?

CodePudding user response:

Another approach (assuming the structure of ending is more or less fixed)

. \s(\w ?),?.{4}\d{4,}

CodePudding user response:

The best guess I could achieve was starting from the end of the string looking for a chain of non-spacing characters (being the portion you are looking for) followed by a space, a chain of capital letters, then an option space/dash and in the end a chain of numbers.

([^\s] ?)\,?\s[A-Z] [\s\-]?\d $

Being the first group, the target you are aiming for.

This is a live example with your use case embedded:

https://regexr.com/6nkq5

(as a side note, the demo on regexr may tell you the expression took more than 250ms and can't render.. you just slightly edit the test case to make it update and show you the actual result)

CodePudding user response:

You can use

,(?:\s*#\d )?\s*([^\s,][^,]*)(?=\W [A-Z]{2}\W*\d{4,}\s*$)

See the regex demo. The necessary value is in Group 1.

Details:

  • , - a comma
  • (?:\s*#\d )? - an optional sequence of zero or more whitespaces, # and then one or more digits
  • \s* - zero or more whitespaces
  • ([^\s,][^,]*) - Group 1: a char other than whitespace and comma and then zero or more non-comma chars
  • (?=\W [A-Z]{2}\W*\d{4,}\s*$) - a positive lookahead that requires (immediately on the right)
    • \W - one or more non-word chars
    • [A-Z]{2} - two uppercase ASCII letters
    • \W* - zero or more non-word chars
    • \d{4,} - gfour or more digits
    • \s* - zero or more whitespaces
    • $ - end of string.

CodePudding user response:

You can match the comma, then all except A-Z and capture from the first occurrence of A-Z.

,[^A-Z,]*?\b([A-Z][^,]*?),?\s*[A-Z]{2}[-\s]\d{4,}\s*$

Explanation

  • ,[^A-Z,]*?\b Match a comma, then any char except A-Z or a comma till a word boundary
  • ([A-Z][^,]*?) Capture group 1 Match A-Z and then any char except a comma as least as possible
  • ,?\s*[A-Z]{2} match optional comma, optional whiteapace chars and 2 uppecase chars A-Z
  • [-\s]\d{4,}\s* Match either - or a whitespace char and then 4 or more digits followed by optional whiteapace chars
  • $ end of string

Regex demo

CodePudding user response:

As long as your match comes always after the (exactly) two country letters, you can use that simple condition to match your city.

(?<= )[A-Za-z ] (?=,? [A-Z]{2})

Your match [A-Za-z ] will be found between

  • (?<= ): a space and
  • (?=,? [A-Z]{2}): an optional comma a space two uppercase letters

Check the demo here.

  • Related