After a week struggling with this problem, I need help with this pattern:
"^(?<Id>\d ) - (?<Agent>.*)(?<Registry>\(S (.*)\)) ?. \((?<Date>\d{2}[/]\d{2}[/]\d{4})\).?$"
My test with https://regex101.com/:
Case 1:
121971 - my text(S 8, H M 42670). (27/06/1974)
Match 1 | 0-46 | 121971 - my text(S 8, H M 42670). (27/06/1974) |
Group 1 | 19-31 | 8, H M 42670 |
Group Id | 0-6 | 121971 |
Group Agent | 9-16 | my text |
Group Registry | 16-32 | (S 8, H M 42670) |
Group Date | 35-45 | 27/06/1974 |
Case 2:
2 - Lorem Ipsum. (19/12/2022).
regex101 returns "Your regular expression does not match the subject string."
I removed last of group "(?(S (.*)))?.". Case 1 joins Agent and Registry in same group. Case 2 returns:
Match 1 | 0-30 | 2 - Lorem Ipsum. (19/12/2022). |
Group Id | 0-1 | 2 |
Group Agent | 4-15 | Lorem Ipsum |
Group Date | 18-28 | 19/12/2022 |
This is expected output:
Group | Case 1 | Case 2 |
---|---|---|
Id | 121971 | 2 |
Agent | my text | Lorem Ipsum |
Registry | S 8, H M 42670 | [null] |
Date | 27/06/1974 | 19/12/2022 |
Thanks
CodePudding user response:
You can use
^(?<Id>\d ) - (?<Agent>.*?)(?:\((?<Registry>[^()]*)\))?\W*\((?<Date>\d{2}/\d{2}/\d{4})\)$
See the regex demo. If you have .?
at the end to match an optional CR char, you can just use \r?
(in case you compile the pattern with RegexOptions.Multiline
option).
Details:
^(?<Id>\d )
- Group "Id": one or more digits-
- a hyphen enclosed with a single regular space(?<Agent>.*?)
- Group "Agent": any zero or more chars other than a newline char, as few as possible(?:\((?<Registry>[^()]*)\))?
- an optional group matching\(
- a(
char(?<Registry>[^()]*)
- Group "Registry": any zero or more chars other than)
and(
\)
- a)
char
\W*
- zero or more non-word chars\(
- a(
char(?<Date>\d{2}/\d{2}/\d{4})
- Group "Date": two digits,/
, two digits,/
and four digits\)
- a)
char$
- end of string.