Home > Net >  Regex for comma-separated values of specific pattern
Regex for comma-separated values of specific pattern

Time:11-27

Given a string such as

cite{AA,BBB, C}  skip{DD}   cite{EE,F}

, I am trying to extract the comma-separated strings in the specific tag (in this case, named as cite) using Regex. Thus, the output for the above string should result in

AA
BBB
C
EE
F

I think /cite{(. ?)}/ selects strings in \cite{....} form, but how do we then split the string in this output? I find that some expressions like [^,(?! )] split the strings based on comma, but I cannot find a way to couple these two things.

CodePudding user response:

You could split on the comma using a lookbehind, checking for the cite{ prefix, for example:

  • enter image description here

    But outside of using a programming language, I'm not sure how you would capture what you want if it's not fixed length, in a single regex. Though if it is fixed-width, you could do something like this:

    • enter image description here

      CodePudding user response:

      Suppose the string were

      cite{AA,BBB, C}  skip{DD} GG  cite{EE,F}
           ^^ ^^^  ^                     ^^ ^
      

      We want to match the five substrings indicated by the carets.

      If the regex engine supports variable-length lookbehinds we could use the regular expression

      (?<=\bcite{[^{}]*?)[A-Z] (?=[^{}]*})
      

      Demo (click the "Context" tab at the link to see the matches).

      This regex can be broken down as follows.

      (?<=         # begin a positive lookbehind 
        \bcite     # match literal preceded by a word break
        {          # match character
        [^{}]*?    # match zero or more chars other than braces, lazily
      )            # end positive lookbehind
      [A-Z]        # match >= 1 uppercase letters
      (?=          # begin positive lookahead
        [^{}]*?    # match zero or more chars other than braces, then '}'    
      )            # end positive lookahead
      
  • Related