I want to perform a simple One-Hot encoding by utilizing DataFrames.jl
's transform!
but I'm unsuccessful. I use the following DataFrame:
using DataFrames
df = DataFrame(
color = ["red", "green", "blue"],
x = [1, 2, 3]
)
# 3×2 DataFrame
# Row │ color x
# │ String Int64
# ─────┼───────────────
# 1 │ red 1
# 2 │ green 2
# 3 │ blue 3
And I defined a simple function to return the encoded matrix:
function OneHotEncod(vec::Vector{String})
reduce(hcat, [vec .== i for i=vec])
end
Then, when I run the following code, I get an error:
transform!(df, Cols(:color) => x -> OneHotEncod(x), renamecols=false)
ERROR: ArgumentError: Table returned, but a single output column was expected
The error is clear, but I wonder if there is any way to use transform!
while the specified function returns more than one vector (like a Matrix)?
Appendix:
OneHotEncod(df.color)
# 3×3 BitMatrix:
# 1 0 0
# 0 1 0
# 0 0 1
CodePudding user response:
You just need to specify that the output has multiple columns like this:
julia> transform!(df, :color => OneHotEncod => AsTable)
3×5 DataFrame
Row │ color x x1 x2 x3
│ String Int64 Bool Bool Bool
─────┼────────────────────────────────────
1 │ red 1 true false false
2 │ green 2 false true false
3 │ blue 3 false false true
A natural alternative is:
julia> transform!(df, [:color => ByRow(==(c)) => c for c in unique(df.color)])
3×8 DataFrame
Row │ color x x1 x2 x3 red green blue
│ String Int64 Bool Bool Bool Bool Bool Bool
─────┼─────────────────────────────────────────────────────────
1 │ red 1 true false false true false false
2 │ green 2 false true false false true false
3 │ blue 3 false false true false false true
as then you automatically set informative column names.