Home > Software engineering >  Error when going from dataframe to excel file in Julia
Error when going from dataframe to excel file in Julia

Time:07-13

I am trying to export a dataframe to xlsx

df |> XLSX.writexlsx("df.xlsx")

and am getting this error:

ERROR: Unsupported datatype String31 for writing data to Excel file. Supported data types are Union{Missing, Bool, Float64, Int64, Dates.Date, Dates.DateTime, Dates.Time, String} or XLSX.CellValue.

I cannot find anyting about String31 utilizing Google. Is there a "transform" that I need to perform on the df beforehand?

CodePudding user response:

It seems that currently this is a limitation of XSLX.jl. I have opened an issue proposing to fix it.

I assume your df is not huge, in which case the following solution should work for you:

to_string(x::AbstractString) = String(x)
to_string(x::Any) = x
XLSX.writetable("df.xlsx", to_string.(df))

CodePudding user response:

I guess you're reading the data in from a CSV file? CSV.jl uses fixed-length strings from the InlineStrings.jl package when it reads from a file, which can be much better for managing memory allocation, and hence for performance. For eg.:

julia> df = CSV.read(IOBuffer("S, I, F
                   hello Julian world, 2, 3.14"), DataFrame)
1×3 DataFrame
 Row │ S                    I      F      
     │ String31            Int64  Float64 
─────┼────────────────────────────────────
   1 │ hello Julian world      2     3.14

Here, the S column was assigned a String31 type since that's the smallest type from InlineStrings that can hold our string.

However, it looks like XLSX.jl doesn't know how to handle InlineString types, unfortunately.

You can specify when reading the file that you want all strings to be read as just String type and not the fixed-length InlineString variety.

julia> df = CSV.read(IOBuffer("S, I, F
                   hello Julian world, 2, 3.14"), DataFrame, stringtype = String)
1×3 DataFrame
 Row │ S                    I      F      
     │ String              Int64  Float64 
─────┼────────────────────────────────────
   1 │ hello Julian world      2     3.14

Or, you can transform the column just before writing to the file.

julia> transform!(df, :S => ByRow(String) => :S)
1×3 DataFrame
 Row │ S                    I      F      
     │ String              Int64  Float64 
─────┼────────────────────────────────────
   1 │ hello Julian world      2     3.14

If you have multiple InlineString type columns and don't want to specify them individually, you can also do:

julia> to_string(c::AbstractVector) = c;

julia> to_string(c::AbstractVector{T}) where T <: AbstractString = String.(c);

julia> mapcols!(to_string, df)
1×3 DataFrame
 Row │ S                    I      F      
     │ String              Int64  Float64 
─────┼────────────────────────────────────
   1 │ hello Julian world      2     3.14

(Edited to use multiple dispatch here (based on Bogumil Kaminsky's answer) rather than typecheck in the function.)

  • Related