I have a binary file containing records from a C struct. I would like to read that file into a Polars Dataframe.
I can accomplish that as below, but I'm wondering if there is a more direct path?
My current solution involves:
- Reading the file into a Numpy record array (see below) using using
np.fromfile()
- Converting that into a Pandas DataFrame
- Converting that to a Polars DataFrame
# Data read in from file using np.fromfile()
data = np.array([(1, 2002, 2, 13, 0.3),
(2, 2005, 1, -10, 1.5),
(3, 2004, 2, 54, -0.12)],
dtype=[("id", "<i4"),("yr", "<u2"),("sex", "<u2"),("val1", "<i2"),("val2", "<f4")]
)
df = pl.from_pandas(pd.DataFrame(data))
df
id yr sex val1 val2
i32 u16 u16 i16 f32
1 2002 2 13 0.3
2 2005 1 -10 1.5
3 2004 2 54 -0.12
I've tried reading data
directly into Polars from numpy using pl.DataFrame(data)
or pl.from_records(data)
, but in both cases I get a single column dataframe of type "object", which I can't work out how to separate into separate columns or convert to a struct.
CodePudding user response:
data = np.array([(1, 2002, 2, 13, 0.3),
(2, 2005, 1, -10, 1.5),
(3, 2004, 2, 54, -0.12)],
dtype=[("id", "<i4"),("yr", "<u2"),("sex", "<u2"),("val1", "<i2"),("val2", "<f4")]
)
pl.DataFrame(
{
field_name: data[field_name]
for field_name in data.dtype.fields
}
)
┌─────┬──────┬─────┬──────┬───────┐
│ id ┆ yr ┆ sex ┆ val1 ┆ val2 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ u16 ┆ u16 ┆ i16 ┆ f32 │
╞═════╪══════╪═════╪══════╪═══════╡
│ 1 ┆ 2002 ┆ 2 ┆ 13 ┆ 0.3 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2 ┆ 2005 ┆ 1 ┆ -10 ┆ 1.5 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3 ┆ 2004 ┆ 2 ┆ 54 ┆ -0.12 │
└─────┴──────┴─────┴──────┴───────┘