Home > Blockchain >  "ERROR: ArgumentError: Table returned but a single output column was expected" in transfor
"ERROR: ArgumentError: Table returned but a single output column was expected" in transfor

Time:12-29

I want to perform a simple One-Hot encoding by utilizing DataFrames.jl's transform! but I'm unsuccessful. I use the following DataFrame:

using DataFrames

df = DataFrame(
  color = ["red", "green", "blue"],
  x = [1, 2, 3]
)
# 3×2 DataFrame
#  Row │ color   x
#      │ String  Int64
# ─────┼───────────────
#    1 │ red         1
#    2 │ green       2
#    3 │ blue        3

And I defined a simple function to return the encoded matrix:

function OneHotEncod(vec::Vector{String})
  reduce(hcat, [vec .== i for i=vec])
end

Then, when I run the following code, I get an error:

transform!(df, Cols(:color) => x -> OneHotEncod(x), renamecols=false)
ERROR: ArgumentError: Table returned, but a single output column was expected

The error is clear, but I wonder if there is any way to use transform! while the specified function returns more than one vector (like a Matrix)?


Appendix:

OneHotEncod(df.color)
# 3×3 BitMatrix:
#  1  0  0
#  0  1  0
#  0  0  1

CodePudding user response:

You just need to specify that the output has multiple columns like this:

julia> transform!(df, :color => OneHotEncod => AsTable)
3×5 DataFrame
 Row │ color   x      x1     x2     x3
     │ String  Int64  Bool   Bool   Bool
─────┼────────────────────────────────────
   1 │ red         1   true  false  false
   2 │ green       2  false   true  false
   3 │ blue        3  false  false   true

A natural alternative is:

julia> transform!(df, [:color => ByRow(==(c)) => c for c in unique(df.color)])
3×8 DataFrame
 Row │ color   x      x1     x2     x3     red    green  blue
     │ String  Int64  Bool   Bool   Bool   Bool   Bool   Bool
─────┼─────────────────────────────────────────────────────────
   1 │ red         1   true  false  false   true  false  false
   2 │ green       2  false   true  false  false   true  false
   3 │ blue        3  false  false   true  false  false   true

as then you automatically set informative column names.

  • Related