I have some string like this below:
0015/Cnt.A/2021/EX. Mmj tech
021/Cnt.B/2021/EX.Mm logs
31/ Cgt.A / 2020 / PK Jap
453/ Nnt.A / 2020 / WK Jap pom sc
13/Wnt.A/2021/ LO.Mm pom
1911/Cno.A/2021/PQ Mm ris dMn
and I want to select for output like this below:
0015/Cnt.A/2021/EX. Mmj
021/Cnt.B/2021/EX.Mm
31/ Cgt.A / 2020 / PK Jap
453/ Nnt.A / 2020 / WK Jap
13/Wnt.A/2021/ LO.Mm
1911/Cno.A/2021/PQ Mm
I have tried this pattern [0-9]{1,}\/[a-zA-Z.\s-]{1,}\/[0-9\s]{1,}\/[a-zA-Z\s] [\.\s] [a-zA-Z]{1,}
but it can't handle the 4th and 6th string. Anyone, can fix that pattern? and maybe make it more efficient?
edited:
There is a rule like this pattern -> number/letter with dot or space/year/letter with dot or space
CodePudding user response:
The pattern to get all text up to the last slash and then only two words separated with a whitespace or .
is
.*\/\s*[a-zA-Z] [\s.] [a-zA-Z]
.*\/\s*\w [\s.] \w
If you need to keep the initial regex part for stricter validation, use
[0-9] \/[a-zA-Z.\s-] \/[0-9\s] \/\s*\w [\s.] \w
See this demo (or this demo). Details:
.*\/
- any zero or more chars other than line break chars, as many as possible\s*
- zero or more whitespaces[a-zA-Z]
- one or more ASCII letters[\s.]
- one or more whitespaces/dots[a-zA-Z]
- one or more ASCII letters.
\w
would match one or more letters, digits, or underscores.
Now, accommodating for the number/letter with dot or space/year/letter with dot or space
rule:
\d \/\s*[a-zA-Z] (?:\.[a-zA-Z] )*\s*\/\s*[0-9]{4}\s*\/\s*\w [\s.] \w
See this regex demo. Details:
\d
- one or more digits\/
- a/
char\s*
- zero or more whitespaces[a-zA-Z] (?:\.[a-zA-Z] )*
\s*\/\s*
- 0 whitespaces,/
, 0 whitespaces\d{4}
- four digits\s*\/\s*
- 0 whitespaces,/
, 0 whitespaces\w [\s.] \w
- one or more word chars, 1 whitespaces/dots, 1 word chars.