I'm looking for the most efficient way to split a Lua string into a table.
I found two possible ways using gmatch
or gsub
and tried to make them as fast as possible.
function string:split1(sep)
local sep = sep or ","
local result = {}
local i = 1
for c in (self..sep):gmatch("(.-)"..sep) do
result[i] = c
i = i 1
end
return result
end
function string:split2(sep)
local sep = sep or ","
local result = {}
local pattern = string.format("([^%s] )", sep)
local i = 1
self:gsub(pattern, function (c) result[i] = c i = i 1 end)
return result
end
The second option takes ~50% longer than the first.
What is the right way and why?
Added: I added a third function with the same pattern. It shows the best result.
function string:split3(sep)
local sep = sep or ","
local result = {}
local i = 1
for c in self:gmatch(string.format("([^%s] )", sep)) do
result[i] = c
i = i 1
end
return result
end
"(.-)"..sep
- works with a sequence.
"([^" .. sep .. "] )"
works with a single character. In fact, for each character in the sequence.
string.format("([^%s] )", sep)
is faster than "([^" .. sep .. "] )"
.
The string.format("(.-)%s", sep)
shows almost the same time as "(.-)"..sep
.
result[i]=c i=i 1
is faster than result[#result 1]=c
and table.insert(result,c)
Code for test:
local init = os.clock()
local initialString = [[1,2,3,"afasdaca",4,"acaac"]]
local temTable = {}
for i = 1, 1000 do
table.insert(temTable, initialString)
end
local dataString = table.concat(temTable,",")
print("Creating data: ".. (os.clock() - init))
init = os.clock()
local data1 = {}
for i = 1, 1000 do
data1 = dataString:split1(",")
end
print("split1: ".. (os.clock() - init))
init = os.clock()
local data2 = {}
for i = 1, 1000 do
data2 = dataString:split2(",")
end
print("split2: ".. (os.clock() - init))
init = os.clock()
local data3 = {}
for i = 1, 1000 do
data3 = dataString:split3(",")
end
print("split3: ".. (os.clock() - init))
Times:
Creating data: 0.000229
split1: 1.189397
split2: 1.647402
split3: 1.011056
CodePudding user response:
The gmatch
version is preferred. gsub
is intended for "global substitution" - string replacement - rather than iterating over matches; accordingly it presumably has to do more work.
The comparison isn't quite fair though as your patterns differ: For gmatch
you use "(.-)"..sep
and for gsub
you use "([^" .. sep .. "] )"
. Why don't you use the same pattern for both? In newer Lua versions you could even use the frontier pattern.
The different patterns also lead to different behavior: The gmatch
-based func will return empty matches whereas the others won't. Note that the "([^" .. sep .. "] )"
pattern allows you to omit the parentheses.