I am looking for a solution to handle with when column name is not exist in Julia dataframes.
In more detail, let's say I have following dataframes and list :
df = DataFrame(id= "12345", description= rand(5));
err_Li = ["12345"]
I need to check whether id
column is match with err_Li
such as :
if (df[1,"id"] in err_Li)
println("NOT VALID")
else
end
However, in some of my dataframes, column name id
is not exist. So that in Python, I can handle with try-except
such as :
try :
if df['id'][0] in err_Li:
print('err')
else: pass
except : pass
How can I do control flow when column name is not exist in Julia dataframe or is there equivalent function with try-except
in Python ?
CodePudding user response:
Since you know and expect that some of your dataframes will not contain this column, it is generally better to use ordinary control flow (if
-else
) in this scenario. Exceptions should be reserved for exceptional situations.
In this case, you can add an extra condition to your if
statement, like this:
julia> df = DataFrame(id= "12345", description= rand(5));
julia> if columnindex(df, :id) > 0 && df[1, :id] in err_Li
println("NOT VALID")
end
NOT VALID
julia> df2 = DataFrame(blah= "13579", description= rand(5));
julia> if columnindex(df2, :id) > 0 && df2[1, :id] in err_Li
println("NOT VALID")
end
julia>
columnindex
is a function that takes a DataFrame and a Symbol (:id
), and returns 0 if the Symbol doesn't correspond to a column name in the DataFrame. If :id
did exist as a column in the DataFrame, columnindex
would return the position of that column (starting from 1), so in that case the result will be greater than 0. Thus, we can use it to check whether the column exists in our DataFrame, and only continue with checking the value if the column exists.
CodePudding user response:
As an alternative to the answer given above you can use the hasproperty
test:
julia> using DataFrames
julia> df = DataFrame(id="12345", description=rand(5));
julia> propertynames(df)
2-element Vector{Symbol}:
:id
:description
julia> hasproperty(df, :id)
true
julia> hasproperty(df, "id")
true
julia> hasproperty(df, :x)
false
julia> hasproperty(df, "x")
false
The reason this works is that data frame objects support getting their columns using the .
syntax (which calls getproperty
):
julia> df.id
5-element Vector{String}:
"12345"
"12345"
"12345"
"12345"
"12345"
julia> df."id"
5-element Vector{String}:
"12345"
"12345"
"12345"
"12345"
"12345"