Home > OS >  Initializing a column with missing values and filling in fields later
Initializing a column with missing values and filling in fields later

Time:09-26

How can one initialize a column in a DataFrame with missing values and then fill some elements of that column in later with Float values?

julia> df = DataFrame(:a => rand(4), :b => rand(4))
4×2 DataFrame
 Row │ a         b        
     │ Float64   Float64  
─────┼────────────────────
   10.840074  0.673613
   20.98867   0.33807
   30.433315  0.150228
   40.495254  0.833268

julia> insertcols!(df, :c => missing)
4×3 DataFrame
 Row │ a         b         c       
     │ Float64   Float64   Missing 
─────┼─────────────────────────────
   10.840074  0.673613  missing 
   20.98867   0.33807   missing 
   30.433315  0.150228  missing 
   40.495254  0.833268  missing 

julia> for row in eachrow(df)
           if rand() > 0.5 #based on processing of the row
               row[:c] = 1.0
           end
       end
ERROR: MethodError: convert(::Type{Union{}}, ::Float64) is ambiguous.

CodePudding user response:

One can do this the following way -

df.c = Vector{Union{Float64,Missing}}(missing, size(df, 1))

CodePudding user response:

This is the way I normally do it:

julia> using DataFrames

julia> df = DataFrame(:a => rand(4), :b => rand(4))
4×2 DataFrame
 Row │ a         b
     │ Float64   Float64
─────┼────────────────────
   10.388546  0.522189
   20.232263  0.102722
   30.519866  0.578753
   40.493797  0.146636

julia> df.c = missings(Float64, nrow(df))
4-element Vector{Union{Missing, Float64}}:
 missing
 missing
 missing
 missing

julia> df
4×3 DataFrame
 Row │ a         b         c
     │ Float64   Float64   Float64?
─────┼──────────────────────────────
   10.388546  0.522189   missing
   20.232263  0.102722   missing
   30.519866  0.578753   missing
   40.493797  0.146636   missing

see also https://bkamins.github.io/julialang/2021/09/03/missing.html for more examples of working with missing values.

  • Related