Home > OS >  How to convert Pandas DataFrame to Julia DataFrame.jl
How to convert Pandas DataFrame to Julia DataFrame.jl

Time:05-31

I have not been able to find a way to convert my 30,000 x 1,000 Pandas.jl String DataFrame into a DataFrames.jl DataFrame. I have attempted previous stackoverflow solutions but they have not worked. I would like to know what the best way is to convert the dataframe. Thanks for your help.

CodePudding user response:

Preparing data:

julia> import Pandas

julia> import DataFrames

julia> df_df1 = DataFrames.DataFrame(string.(rand(1:10, 10, 5)), :auto)
10×5 DataFrame
 Row │ x1      x2      x3      x4      x5
     │ String  String  String  String  String
─────┼────────────────────────────────────────
   1 │ 6       1       2       5       4
   2 │ 9       5       1       1       9
   3 │ 9       1       5       2       9
   4 │ 6       7       9       1       5
   5 │ 1       10      8       5       1
   6 │ 8       5       9       9       6
   7 │ 9       8       9       8       4
   8 │ 2       6       10      5       4
   9 │ 5       4       8       9       8
  10 │ 5       4       10      5       8

julia> pd_df = Pandas.DataFrame(df_df1)
  x1  x2  x3 x4 x5
0  6   1   2  5  4
1  9   5   1  1  9
2  9   1   5  2  9
3  6   7   9  1  5
4  1  10   8  5  1
5  8   5   9  9  6
6  9   8   9  8  4
7  2   6  10  5  4
8  5   4   8  9  8
9  5   4  10  5  8

and now the task you want to do:

julia> DataFrames.DataFrame([col => collect(pd_df[col]) for col in pd_df.pyo.columns])
10×5 DataFrame
 Row │ x1      x2      x3      x4      x5
     │ String  String  String  String  String
─────┼────────────────────────────────────────
   1 │ 6       1       2       5       4
   2 │ 9       5       1       1       9
   3 │ 9       1       5       2       9
   4 │ 6       7       9       1       5
   5 │ 1       10      8       5       1
   6 │ 8       5       9       9       6
   7 │ 9       8       9       8       4
   8 │ 2       6       10      5       4
   9 │ 5       4       8       9       8
  10 │ 5       4       10      5       8

(unfortunately Pandas.jl does not correctly support Tables.jl interface so such work-around seems to be needed; I also decided to drop Pandas Series and convert it to standard Julia Vector)

CodePudding user response:

TLDR is export to Arrow or CSV, and import. (There might be a way to do this with pycall, but it won't be as easy)

  • Related