Home > front end >  Regular Expressions: Return Null when a group does not appear
Regular Expressions: Return Null when a group does not appear

Time:07-06

After a week struggling with this problem, I need help with this pattern:

"^(?<Id>\d ) - (?<Agent>.*)(?<Registry>\(S (.*)\)) ?. \((?<Date>\d{2}[/]\d{2}[/]\d{4})\).?$"

My test with https://regex101.com/:

Case 1:

121971 - my text(S 8, H M 42670). (27/06/1974)

Match 1 0-46 121971 - my text(S 8, H M 42670). (27/06/1974)
Group 1 19-31 8, H M 42670
Group Id 0-6 121971
Group Agent 9-16 my text
Group Registry 16-32 (S 8, H M 42670)
Group Date 35-45 27/06/1974

Case 2:

2 - Lorem Ipsum. (19/12/2022).

regex101 returns "Your regular expression does not match the subject string."

I removed last of group "(?(S (.*)))?.". Case 1 joins Agent and Registry in same group. Case 2 returns:

Match 1 0-30 2 - Lorem Ipsum. (19/12/2022).
Group Id 0-1 2
Group Agent 4-15 Lorem Ipsum
Group Date 18-28 19/12/2022

This is expected output:

Group Case 1 Case 2
Id 121971 2
Agent my text Lorem Ipsum
Registry S 8, H M 42670 [null]
Date 27/06/1974 19/12/2022

Thanks

CodePudding user response:

You can use

^(?<Id>\d ) - (?<Agent>.*?)(?:\((?<Registry>[^()]*)\))?\W*\((?<Date>\d{2}/\d{2}/\d{4})\)$

See the regex demo. If you have .? at the end to match an optional CR char, you can just use \r? (in case you compile the pattern with RegexOptions.Multiline option).

Details:

  • ^(?<Id>\d ) - Group "Id": one or more digits
  • - - a hyphen enclosed with a single regular space
  • (?<Agent>.*?) - Group "Agent": any zero or more chars other than a newline char, as few as possible
  • (?:\((?<Registry>[^()]*)\))? - an optional group matching
    • \( - a ( char
    • (?<Registry>[^()]*) - Group "Registry": any zero or more chars other than ) and (
    • \) - a ) char
  • \W* - zero or more non-word chars
  • \( - a ( char
  • (?<Date>\d{2}/\d{2}/\d{4}) - Group "Date": two digits, /, two digits, / and four digits
  • \) - a ) char
  • $ - end of string.
  • Related