Home > Enterprise >  How can I add new column in Julia Dataframe
How can I add new column in Julia Dataframe

Time:11-16

Let's say I have dataframe and vector such as :

dataframe = DataFrame(Data1 = rand(10), Data2 = rand(10));
Data3 = rand(10)

I want to add Data3 to the dataframe such as:

    Data1       Data2     Data3
    Float64     Float64   Float64
1   0.757345    0.903133  0.502133
2   0.294749    0.327502  0.323133
3   0.156397    0.427323  0.123133

In Python, I can just df["Data3"] = Data3 to add column, but in Julia dataframe, df[!,Data3] = Data3 returns :

  • MethodError: no method matching setindex!(::DataFrame, ::Vector{Float64}, ::typeof(!), ::Vector{Float64})

Also I've checked this solution, but this gave me :

  • ArgumentError: syntax df[column] is not supported use df[!, column] instead.

How can I add vector as a new column in Julia Dataframe?

CodePudding user response:

You were almost there, you are looking for:

dataf[!, :Data3] = Data3

or

dataframe[!, "Data3"] = Data3

or

dataframe.Data3 = Data3

note that I'm using a Symbol or String here - the [!, :Data3] is an indexing operation, so it needs an identifier of the row (!) and column (:Data3) index where you want the data to be stored, not the data itself.

You are binding the actual data (a 10-element vector of random numbers) to the variable Data3, so doing dataframe[!, Data3] with the variable Data3 (rather than a Symbol or String with the value "Data3") is equivalent to doing

dataframe[!, rand(10)]

which means "I want to access all rows (!) of a DataFrame, and 10 columns identified by 10 random numbers". Now indexing by a random floating point number doesn't make a lot of sense (what should dataframe[!, 0.532] return?) which is why you get the error you see - setindex does not accept a Vector{Float} as an argument.

Regarding the Discourse thread you linked, it is very old and the df["col"] syntax has been deprecated a long time ago. The basic indexing concept in DataFrames is that a DataFrame is a two-dimensional data structure, and as such should be indexed by df[row_indices, col_indices].

DataFrames supports a variety of ways of specifying valid indices, which are too numerous to go into detail here but are listed in the docs here.

  • Related