I was writing pandas dataframes to disk using pd.to_feather()
and I noticed that after reading them back, some code that worked previously, now failed. I just checked and the reason is that my original dataframe has some columns with type list
values, and those values get converted to type numpy.ndarray
when writing them to feather (or parquet), so reading them back from feather doesn't produce the same original types.
I read in the pyarrow documentation and search in the pandas issues, but i didn't find anything. My solution is to write dataframes as pickle files, but those are 4 times bigger. I'm not sure if this is a bug, but also, converting types without any warning seems like it.
I'm in pandas 1.4.4
and pyarrow 9.0.0
CodePudding user response:
It's the standard behaviour of pyarrow to represent list
arrays as numpy array when converting an arrow table to pandas.
You can write some simple python code to convert your list columns from np.ndarray
to list
import pyarrow.feather
table = pyarrow.feather.read_table('file.feather')
df = table.to_pandas()
for field in table.schema:
if pa.types.is_list(field.type):
df[field.name] = df[field.name].apply(list)