Runtime: lua 5.1.x compiled under ARM64, no c-modules allowed
Example code, ready to run: https://paste.gg/p/anonymous/08f364480a5f470e9da610ab565e11c0
I need to concat bunch of string per X ms in a loop. From my understanding, LUA supports string interning, which means that string literals are "cached" and not allocated each time. Therefore, only direct calls tostring()
(or ..
sugar) will allocate. The rest of existing string-values will be passed by reference.
What I've done so far:
- eliminated all integer->string allocations (via LUT)
- although
tostring(bool)
does return interned string from cache, I eliminated that too - created pseudo-stringbuilder via table that works via indicies (~16B each)
- "pre-resized" said table to avoid cost of associative addition and made it a global one so it is not collected and re-created each time
- used table.concat() for final big string concatenation
The final results still make me sad:
Allocated pre-concat: 2.486328125 KB
Allocated post-concat: 39.7451171875 KB
Total table meta bytes: 1544 B
Total tostring meta bytes: 273 B
Is there something I'm missing or am I at the limit of LUA here?
CodePudding user response:
I assume that the problem you mention is linked to the memory consumption of the function CONTAINER.PopulateState
. I think your code is OK, but you are not measuring the correct things. I removed all the collectgarbage
in order to gather them into a single part of the code:
print("Allocated PRE-concat: " .. tostring(collectgarbage("count")))
-- First time
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
The results are very different and make more sense:
Allocated PRE-concat: 48.70703125
Allocated POST-concat BEFORE-COLLECT:54.3232421875
Allocated POST-concat AFTER-COLLECT:51.8515625
Allocated POST-concat BEFORE-COLLECT:54.5576171875
Allocated POST-concat AFTER-COLLECT:51.8515625
Allocated POST-concat BEFORE-COLLECT:54.5576171875
Allocated POST-concat AFTER-COLLECT:51.8515625
After the initialization
of the program and before calling the CONTAINER.PopulateState()
, the program already use 48.7 KB.
In the first call to CONTAINER.PopulateState()
, there is a small addition of 3 kilobytes of memory which seems to be persistent: this memory does not seems to be freed in the program execution. This might be due to the bytecode compilation, caching or internal use.
But the following executions of CONTAINER.PopulateState()
typically use 2.7 KB of memory and this memory is released each time. The program behavior seems to be pretty consistent: the execution of CONTAINER.PopulateState()
will not make the program use more memory. Actually, the memory temporary used by the function CONTAINER.PopulateState()
(2.7 KB) is negligible compared to the rest of the program (48 KB).
If you want to have a better control of what is happening, you could implement this part using C
language and provide an interface to Lua
.
Full code:
CONTAINER =
{
Ver = "0.3",
--- integer lookup for the DateTime
timeLUT = {[0]="00",[1]="01",[2]="02",[3]="03",[4]="04",[5]="05",[6]="06",[7]="07",[8]="08",[9]="09"},
strCACHE = { [100] = ""},
SubStrA = "Unknown",
SubAPrst = "ASjdasda",
}
for i = 10,99,1 do
CONTAINER.timeLUT[i] = tostring(i)
end
DataBlob = {
vAng = { x = 1.0, y = 2.0, z = 3.0},
vPos = { x = 2131.0, y = 42.0, z = -433.0},
Composite =
{
VARIANT1 = { isFirst = true, isMiddle = false, isLast = true },
VARIANT2 = { isIgnored = true},
VARIANT3 = { isAccurate = false },
VARIANT4 = { bEnabled = false },
VARIANT5 = { isLocked = false, ImpactV = 1.8 },
VARIANT6 = { troCoWal = true },
VARIANT7 = { isBroCal = false }
}
}
Global = {
isLocked = function(x)return false end,
GetTimeStamp = function(x)return math.random() math.random(1, 99) end,
GetLocalTimeStamp = function(x)return math.random() math.random(1, 99) end,
GetTotalPTime = function(x)return math.random() math.random(1, 99) end,
GetDataBlob = function(x)return DataBlob end,
GetName = function(x)return "AThing" end
}
function CONTAINER.PopulateState()
local gcInit = 0
local gcLast = 0
-- Cachig globals
local floor, mod, tostring = math.floor, math.mod, tostring
local G = Global
local intCache = CONTAINER.timeLUT
local strBuilder = CONTAINER.strCACHE
-- Fetching & caching data
local locDB, Name = G.GetDataBlob(), G.GetName()
local ts = G.GetTimeStamp()
local lag = math.random() math.random(1, 2)
-- Local helpers
local function sBool(bool)
return bool and "1" or "0"
end
local t = 0
function cAppend(cTbl, ...)
for i=0, arg.n do
cTbl[#cTbl 1] = arg[i]
t = t 1
end
end
function cClear(cTbl)
for _=0, #cTbl do
cTbl[#cTbl] = nil
end
end
-- Populating table
cClear(strBuilder)
if locDB ~= nil then
locDB = G.GetDataBlob()
local PC = locDB.Composite
local tp = G.GetTotalPTime()
local d, h, m, s = floor(tp/86400), floor(mod(tp, 86400)/3600), floor(mod(tp,3600)/60), floor(mod(tp,60))
cAppend(strBuilder, "[", Name, "]:\n",
"Ang :", "(", tostring(locDB.vAng.x),",",tostring(locDB.vAng.y),",",tostring(locDB.vAng.z), ")\n",
"Pos :", "(", tostring(locDB.vPos.x),",",tostring(locDB.vPos.y),",",tostring(locDB.vPos.z), ")\n",
"isLocked: ", sBool(G.isLocked()), "\n")
if (locDB.Composite["VARIANT1"] ~= nil) then
cAppend(strBuilder, "isFirst / isLast: ", sBool(PC.VARIANT1.isFirst)," / ",sBool(PC.VARIANT1.isLast), "\n",
"isMiddle: ", sBool(PC.VARIANT1.isMiddle), "\n")
end
if (locDB.Composite["VARIANT2"] ~= nil) then
cAppend(strBuilder, "isIgnored: ", sBool(PC.VARIANT2.isIgnored), "\n")
end
if (locDB.Composite["VARIANT4"] ~= nil) then
cAppend(strBuilder, "bEnabled: ", sBool(PC.VARIANT4.bEnabled), "\n")
end
if (locDB.Composite["VARIANT3"] ~= nil) then
cAppend(strBuilder, "isAccurate: ", sBool(PC.VARIANT3.isAccurate), "\n")
end
if (locDB.Composite["VARIANT5"] ~= nil) then
cAppend(strBuilder, "isLocked: ", sBool(PC.VARIANT5.isLocked), "\n",
"ImpactV: ", tostring(PC.VARIANT5.ImpactV), "\n")
end
if (locDB.Composite["VARIANT6"]) then
cAppend(strBuilder, "troCoWal: ", sBool(PC.VARIANT6.troCoWal), "\n")
end
if (locDB.Composite["VARIANT7"]) then
cAppend(strBuilder, "isBroCal: ", sBool(PC.VARIANT7.isBroCal), "\n")
end
cAppend(strBuilder, "Time taken: ",intCache[d],":",intCache[h],":",intCache[m],":",intCache[s], "\n",
"TS: ", tostring(ts), "\n",
"local TS: ", tostring(G.GetLocalTimeStamp()),"\n",
"Lag: ", string.format("%.5f", lag) , " ms\n",
"Heap: ", tostring(gcLast), "KB\n")
cAppend(strBuilder, "Alloc: ", tostring(gcLast-gcInit),"KB"," (v", CONTAINER.Ver, ")","\n",
"Extra: ", CONTAINER.SubStrA, "_", CONTAINER.SubAPrst, "\n")
end
end
print("Allocated PRE-concat: " .. tostring(collectgarbage("count")))
-- First time
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
CodePudding user response:
You want to minimize the number of intermediate allocations of strings object in order to reduce the GC pressure and slow down GC hits. In this case, I suggest you to limit yourself to 1 call to string.format
with the string your want to format:
- The string format can be declared globally so that it is
interned
once. - The
string.format
code can be read here. What we can see from this code is that the intermediate string transformations are done on theC
stack with a buffer ofLUAL_BUFFERSIZE
bytes. This size is declared inluaconf.h
and can be customized according to your needs. This approach should be the most efficient for your use-case as you just drop all the intermediate steps (table insertions, table.concat, etc).
local MY_STRING_FORMAT = [[My Very Big String
param-string-1 %d
param-string-2 %x
param-string-3 %f
param-string-4 %d
param-string-5 %d
]]
string.format(MY_STRING_FORMAT,
Param1,
Param2,
Param3,
Param4,
Param5,
etc...)