Home > database >  Control flow with colum name in Julia dataframe
Control flow with colum name in Julia dataframe

Time:11-12

I am looking for a solution to handle with when column name is not exist in Julia dataframes.

In more detail, let's say I have following dataframes and list :

df = DataFrame(id= "12345", description= rand(5));
err_Li = ["12345"] 

I need to check whether id column is match with err_Li such as :

if (df[1,"id"] in err_Li)
    println("NOT VALID")
else 
end

However, in some of my dataframes, column name id is not exist. So that in Python, I can handle with try-except such as :

try :
    if df['id'][0] in err_Li:
        print('err')
    else: pass

except : pass

How can I do control flow when column name is not exist in Julia dataframe or is there equivalent function with try-except in Python ?

CodePudding user response:

Since you know and expect that some of your dataframes will not contain this column, it is generally better to use ordinary control flow (if-else) in this scenario. Exceptions should be reserved for exceptional situations.

In this case, you can add an extra condition to your if statement, like this:

julia> df = DataFrame(id= "12345", description= rand(5));

julia> if columnindex(df, :id) > 0 && df[1, :id] in err_Li
           println("NOT VALID")
       end
NOT VALID

julia> df2 = DataFrame(blah= "13579", description= rand(5));

julia> if columnindex(df2, :id) > 0 && df2[1, :id] in err_Li
           println("NOT VALID")
       end

julia> 

columnindex is a function that takes a DataFrame and a Symbol (:id), and returns 0 if the Symbol doesn't correspond to a column name in the DataFrame. If :id did exist as a column in the DataFrame, columnindex would return the position of that column (starting from 1), so in that case the result will be greater than 0. Thus, we can use it to check whether the column exists in our DataFrame, and only continue with checking the value if the column exists.

CodePudding user response:

As an alternative to the answer given above you can use the hasproperty test:

julia> using DataFrames

julia> df = DataFrame(id="12345", description=rand(5));

julia> propertynames(df)
2-element Vector{Symbol}:
 :id
 :description

julia> hasproperty(df, :id)
true

julia> hasproperty(df, "id")
true

julia> hasproperty(df, :x)
false

julia> hasproperty(df, "x")
false

The reason this works is that data frame objects support getting their columns using the . syntax (which calls getproperty):

julia> df.id
5-element Vector{String}:
 "12345"
 "12345"
 "12345"
 "12345"
 "12345"

julia> df."id"
5-element Vector{String}:
 "12345"
 "12345"
 "12345"
 "12345"
 "12345"
  • Related