Every corner of Julia's documentation is filled with reminders to "avoid global scope variables". But I fail to see how this could be beneficial even in some of the most common data analysis scenarios, probably due to a misunderstanding regarding how Julia's compiler works.
For example, one function I use checks whether each token of a document belongs to a huge lexicon of acceptable tokens. Currently, I use something like this:
using CSV, DataFrames
accepted_tokens = @chain begin
CSV.read("accepted_tokens.csv", DataFrame)
Set{String}(_.tokens)
end
function redact_document(doc::String)
tokens = split(doc, " ")
redacted_tokens = [token in accepted_tokens ? token : "REDACTED" for token in tokens]
return join(" ", redacted_tokens)
end
Now, since redact_document
is the only function the uses accepted_tokens
I of course could just assign the variable inside the function, like this:
function redact_document(doc::String)
accepted_tokens = @chain begin
CSV.read("accepted_tokens.csv", DataFrame)
Set{String}(_.tokens)
end
tokens = split(doc, " ")
redacted_tokens = [token in accepted_tokens ? token : "REDACTED" for token in tokens]
return join(" ", redacted_tokens)
end
The reason I don't do this is that it seems to me that in this case accedted_tokens
would need to be assigned each time redact_document
is called, which seems like a total waste of time, given that I'd have to read a huge file from disk every time, instead of creating/assigning the variable just once (albeit in the global scope). I also don't want to declare accepted_tokens
as a constant, since I might want to tweak the lexicon as I develop my script.
Am I right on my reading of the code? Or, as I suspect, the compiler is smarter than what I take it to be, and I should still be wrapping my variables within the functions that use them?
CodePudding user response:
While all has been said in comments, just for cleanness your code should look like this (you should pass accepted_tokens
as argument rather than to use a global variable):
function redact_document(doc::AbstractString, accepted_tokens::AbstractSet{<:AbstractString})
tokens = split(doc, " ")
redacted_tokens = [token in accepted_tokens ? token : "REDACTED" for token in tokens]
return join(" ", redacted_tokens)
end
The type declarations for function arguments are optional (do not affect performance), but if you use the usually it is better to use their abstract counterparts.