Home > OS >  Writing a pandas dataframes as a feather or parquet file converts the list values into numpy arrays
Writing a pandas dataframes as a feather or parquet file converts the list values into numpy arrays

Time:10-31

I was writing pandas dataframes to disk using pd.to_feather() and I noticed that after reading them back, some code that worked previously, now failed. I just checked and the reason is that my original dataframe has some columns with type list values, and those values get converted to type numpy.ndarray when writing them to feather (or parquet), so reading them back from feather doesn't produce the same original types.

I read in the pyarrow documentation and search in the pandas issues, but i didn't find anything. My solution is to write dataframes as pickle files, but those are 4 times bigger. I'm not sure if this is a bug, but also, converting types without any warning seems like it.

I'm in pandas 1.4.4 and pyarrow 9.0.0

CodePudding user response:

It's the standard behaviour of pyarrow to represent list arrays as numpy array when converting an arrow table to pandas.

You can write some simple python code to convert your list columns from np.ndarray  to list

import pyarrow.feather

table = pyarrow.feather.read_table('file.feather')

df = table.to_pandas()

for field in table.schema:
    if pa.types.is_list(field.type):
        df[field.name] = df[field.name].apply(list)

  • Related