I have not been able to find a way to convert my 30,000 x 1,000 Pandas.jl String DataFrame into a DataFrames.jl DataFrame. I have attempted previous stackoverflow solutions but they have not worked. I would like to know what the best way is to convert the dataframe. Thanks for your help.
CodePudding user response:
Preparing data:
julia> import Pandas
julia> import DataFrames
julia> df_df1 = DataFrames.DataFrame(string.(rand(1:10, 10, 5)), :auto)
10×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ String String String String String
─────┼────────────────────────────────────────
1 │ 6 1 2 5 4
2 │ 9 5 1 1 9
3 │ 9 1 5 2 9
4 │ 6 7 9 1 5
5 │ 1 10 8 5 1
6 │ 8 5 9 9 6
7 │ 9 8 9 8 4
8 │ 2 6 10 5 4
9 │ 5 4 8 9 8
10 │ 5 4 10 5 8
julia> pd_df = Pandas.DataFrame(df_df1)
x1 x2 x3 x4 x5
0 6 1 2 5 4
1 9 5 1 1 9
2 9 1 5 2 9
3 6 7 9 1 5
4 1 10 8 5 1
5 8 5 9 9 6
6 9 8 9 8 4
7 2 6 10 5 4
8 5 4 8 9 8
9 5 4 10 5 8
and now the task you want to do:
julia> DataFrames.DataFrame([col => collect(pd_df[col]) for col in pd_df.pyo.columns])
10×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ String String String String String
─────┼────────────────────────────────────────
1 │ 6 1 2 5 4
2 │ 9 5 1 1 9
3 │ 9 1 5 2 9
4 │ 6 7 9 1 5
5 │ 1 10 8 5 1
6 │ 8 5 9 9 6
7 │ 9 8 9 8 4
8 │ 2 6 10 5 4
9 │ 5 4 8 9 8
10 │ 5 4 10 5 8
(unfortunately Pandas.jl does not correctly support Tables.jl interface so such work-around seems to be needed; I also decided to drop Pandas Series
and convert it to standard Julia Vector
)
CodePudding user response:
TLDR is export to Arrow or CSV, and import. (There might be a way to do this with pycall, but it won't be as easy)