Home > Enterprise >  Lua split string to table
Lua split string to table

Time:05-29

I'm looking for the most efficient way to split a Lua string into a table.

I found two possible ways using gmatch or gsub and tried to make them as fast as possible.

function string:split1(sep)
    local sep = sep or ","
    local result = {}
    local i = 1
    for c in (self..sep):gmatch("(.-)"..sep) do
        result[i] = c
        i = i   1
    end
    return result
end

function string:split2(sep)
   local sep = sep or ","
   local result = {}
   local pattern = string.format("([^%s] )", sep)
   local i = 1
   self:gsub(pattern, function (c) result[i] = c i = i   1 end)
   return result
end

The second option takes ~50% longer than the first.

What is the right way and why?

Added: I added a third function with the same pattern. It shows the best result.

function string:split3(sep)
    local sep = sep or ","
    local result = {}
    local i = 1
    for c in self:gmatch(string.format("([^%s] )", sep)) do
        result[i] = c
        i = i   1
    end
    return result
end

"(.-)"..sep - works with a sequence.

"([^" .. sep .. "] )" works with a single character. In fact, for each character in the sequence.

string.format("([^%s] )", sep) is faster than "([^" .. sep .. "] )".

The string.format("(.-)%s", sep) shows almost the same time as "(.-)"..sep.

result[i]=c i=i 1 is faster than result[#result 1]=c and table.insert(result,c)

Code for test:

local init = os.clock()
local initialString = [[1,2,3,"afasdaca",4,"acaac"]]
local temTable = {}
for i = 1, 1000 do
    table.insert(temTable, initialString)
end
local dataString = table.concat(temTable,",")
print("Creating data: ".. (os.clock() - init))
    
init = os.clock()
local data1 = {}
for i = 1, 1000 do
    data1 = dataString:split1(",")
end
print("split1: ".. (os.clock() - init))

init = os.clock()
local data2 = {}
for i = 1, 1000 do
    data2 = dataString:split2(",")
end
print("split2: ".. (os.clock() - init))

init = os.clock()
local data3 = {}
for i = 1, 1000 do
    data3 = dataString:split3(",")
end
print("split3: ".. (os.clock() - init))

Times:

Creating data: 0.000229
split1: 1.189397
split2: 1.647402
split3: 1.011056

CodePudding user response:

The gmatch version is preferred. gsub is intended for "global substitution" - string replacement - rather than iterating over matches; accordingly it presumably has to do more work.

The comparison isn't quite fair though as your patterns differ: For gmatch you use "(.-)"..sep and for gsub you use "([^" .. sep .. "] )". Why don't you use the same pattern for both? In newer Lua versions you could even use the frontier pattern.

The different patterns also lead to different behavior: The gmatch-based func will return empty matches whereas the others won't. Note that the "([^" .. sep .. "] )" pattern allows you to omit the parentheses.

  • Related