Home > Mobile >  Extract substring inbetween quotation marks, but skip \" and turn it into " instead in Lu
Extract substring inbetween quotation marks, but skip \" and turn it into " instead in Lu

Time:06-28

I have this string

"argument \\\" \"some argument\" \"some argument with a quotation mark \\\" in here \""

which prints out as this

argument \" "some argument" "some argument with a quotation mark \" in here"

and I am trying to extract all of it, so that at the end it gets stored like this:

> [1] = "argument",
> [2] = """,
> [3] = "some argument",
> [4] = "some argument with a quotation mark " in here"

This is the code that I have so far.

function ExtractArgs(text)
    local skip = 0
    local arguments = {}
    local curString = ""

    for i = 1, text:len() do
        if (i <= skip) then continue end

        local c = text:sub(i, i)
        
        if (c == "\\") and (text:sub(i 1, i 1) == "\"") then
            continue
        end
        
        if (c == "\"") and (text:sub(i-1, i-1) ~= "\\") then
            local match = text:sub(i):match("%b\"\"")
            
            if (match) and (match:sub(#match-1,#match-1) ~= "\\") then
                curString = ""
                skip = i   #match
                arguments[#arguments   1] = match:sub(2, -2)
            else
                curString = curString..c
            end
        elseif (c == " " and curString ~= "") then
            arguments[#arguments   1] = curString
            curString = ""
        else
            if (c == " " and curString == "") then
                continue
            end

            curString = curString..c
        end
    end

    if (curString ~= "") then
        arguments[#arguments   1] = curString
    end
    
    return arguments
end
print(ExtractArgs("argument \\\" \"some argument\" \"some argument with a quotation mark \\\" in here\""))

It extracts \" correctly that is not inbetween quotation marks, but not if it is inbetween quotation marks.

How can this be solved properly?

This seems to work with regex \"([^\"\\]*(?:\\.[^\"\\]*)*)\" but what about Lua?

CodePudding user response:

The task cannot be done with a single Lua pattern but can be achieved with a chain of a few patterns.
The text parameter must not contain bytes \0, \1 and \2 - these special characters are used for temporary substitution.

local function ExtractArgs(text)
   local arguments = {}
   for argument in 
      ('""'..text:gsub("\\?.", {['\\"']="\1"}))
      :gsub('"(.-)"([^"]*)', function(q,n) return "\2"..q..n:gsub("%s ", "\0") end)
      :sub(2)
      :gmatch"%Z " 
   do
      argument = argument:gsub("\1", '"'):gsub("\2", ""):gsub("\\(.)", "%1")
      print(argument)
      arguments[#arguments 1] = argument
   end   
   return arguments
end

ExtractArgs[[argument \"\\ "" "some argument" "some argument with a quotation mark \" in here \\"]]

Output:

argument
"\

some argument
some argument with a quotation mark " in here \
  • Related