Home > front end >  Global variables always detrimental to Julia's performance?
Global variables always detrimental to Julia's performance?

Time:10-26

Every corner of Julia's documentation is filled with reminders to "avoid global scope variables". But I fail to see how this could be beneficial even in some of the most common data analysis scenarios, probably due to a misunderstanding regarding how Julia's compiler works.

For example, one function I use checks whether each token of a document belongs to a huge lexicon of acceptable tokens. Currently, I use something like this:

using CSV, DataFrames

accepted_tokens = @chain begin
    CSV.read("accepted_tokens.csv", DataFrame)
    Set{String}(_.tokens)
end

function redact_document(doc::String)
    tokens = split(doc, " ")
    redacted_tokens = [token in accepted_tokens ? token : "REDACTED" for token in tokens]
    return join(" ", redacted_tokens)
end

Now, since redact_document is the only function the uses accepted_tokens I of course could just assign the variable inside the function, like this:

function redact_document(doc::String)
    accepted_tokens = @chain begin
        CSV.read("accepted_tokens.csv", DataFrame)
        Set{String}(_.tokens)
    end

    tokens = split(doc, " ")
    redacted_tokens = [token in accepted_tokens ? token : "REDACTED" for token in tokens]
    return join(" ", redacted_tokens)
end

The reason I don't do this is that it seems to me that in this case accedted_tokens would need to be assigned each time redact_document is called, which seems like a total waste of time, given that I'd have to read a huge file from disk every time, instead of creating/assigning the variable just once (albeit in the global scope). I also don't want to declare accepted_tokens as a constant, since I might want to tweak the lexicon as I develop my script.

Am I right on my reading of the code? Or, as I suspect, the compiler is smarter than what I take it to be, and I should still be wrapping my variables within the functions that use them?

CodePudding user response:

While all has been said in comments, just for cleanness your code should look like this (you should pass accepted_tokens as argument rather than to use a global variable):


function redact_document(doc::AbstractString, accepted_tokens::AbstractSet{<:AbstractString})
    tokens = split(doc, " ")
    redacted_tokens = [token in accepted_tokens ? token : "REDACTED" for token in tokens]
    return join(" ", redacted_tokens)
end

The type declarations for function arguments are optional (do not affect performance), but if you use the usually it is better to use their abstract counterparts.

  • Related