Home > database >  How to merge AND & OR together
How to merge AND & OR together

Time:10-31

You can check the regex101 page from here.

I have a list of adresses in different formats and non-english. Assume my list is like below.

KENNEDY CAD. SİRKECİ ARABALI VAPUR İSKELESİ FATİH/ İSTANBUL
YAVUZTÜRK MAH. KARADENİZ CAD. NO:2 ÜSKÜDAR/ İSTANBUL
HAMİDİYE MAH. ALPEREN SOK. NO:15/2  ÇEKMEKÖY/ İSTANBUL
UĞUR MUMCU MAH. YUNUS EMRE CAD. NO:25 KARTAL/ İSTANBUL

The regex I've written is as following:

(?:(?:\p{L}* M[Aa]?[Hh][. ])? *|(?:\p{L}* C[Aa]?[Dd][. ])? *)

My regex return each character as match, but i need to get 4 matches which are:

KENNEDY CAD.
YAVUZTÜRK MAH. KARADENİZ CAD. 
HAMİDİYE MAH. 
UĞUR MUMCU MAH. YUNUS EMRE CAD. 

How can I solve that problem?

CodePudding user response:

You can use

^\p{L} (?:\s \p{L} )*\s (?:M[Aa]?[Hh]|C[Aa]?[Dd])\.?(?:\s \p{L} (?:\s \p{L} )*\s (?:M[Aa]?[Hh]|C[Aa]?[Dd]))*\.?

Details:

  • ^ - start of string
  • \p{L} (?:\s \p{L} )* - a word and then zero or more whitespace separated words
  • \s - one or more whitespaces
  • (?:M[Aa]?[Hh]|C[Aa]?[Dd]) - M, an optional A or a and then h or H, or C, an optional A or a and then D or d
  • \.? - an optional dot
  • (?:\s \p{L} (?:\s \p{L} )*\s (?:M[Aa]?[Hh]|C[Aa]?[Dd]))* - zero or more sequences of one or more whitespaces and the pattern described above
  • \.? - an optional dot

See the regex demo. Or, a bit less precise and efficient, but shorter:

^(?:\s*[\p{L}\s] (?:M[Aa]?[Hh]|C[Aa]?[Dd])\.?) 

See this regex demo. Details:

  • ^ - start of string
  • (?:\s*[\p{L}\s] (?:M[Aa]?[Hh]|C[Aa]?[Dd])\.?) - one or more sequences of
    • \s* - zero or more whitespaces
    • [\p{L}\s] - one or more letters or whitespaces
    • (?:M[Aa]?[Hh]|C[Aa]?[Dd]) - M, an optional A or a and then h or H, or C, an optional A or a and then D or d
    • \.? - an optional dot

CodePudding user response:

Try (regex101):

^(?=.*C[Aa][Dd]\s*\.).*?C[Aa][Dd]\.|^.*?M[Aa][Hh]\s*\.

This will match all string until CAD. or if not found until MAH.

  • Related