Home > Software design >  How to read a C struct (or Numpy record array) into a Polars Dataframe?
How to read a C struct (or Numpy record array) into a Polars Dataframe?

Time:04-30

I have a binary file containing records from a C struct. I would like to read that file into a Polars Dataframe.

I can accomplish that as below, but I'm wondering if there is a more direct path?

My current solution involves:

  • Reading the file into a Numpy record array (see below) using using np.fromfile()
  • Converting that into a Pandas DataFrame
  • Converting that to a Polars DataFrame
# Data read in from file using np.fromfile()
data = np.array([(1, 2002, 2, 13, 0.3),
                 (2, 2005, 1, -10, 1.5),
                 (3, 2004, 2, 54, -0.12)],
    dtype=[("id", "<i4"),("yr", "<u2"),("sex", "<u2"),("val1", "<i2"),("val2", "<f4")]
)
df = pl.from_pandas(pd.DataFrame(data))
df
 id   yr    sex val1    val2
i32  u16    u16  i16     f32
  1 2002      2   13     0.3
  2 2005      1  -10     1.5
  3 2004      2   54   -0.12

I've tried reading data directly into Polars from numpy using pl.DataFrame(data) or pl.from_records(data), but in both cases I get a single column dataframe of type "object", which I can't work out how to separate into separate columns or convert to a struct.

CodePudding user response:

data = np.array([(1, 2002, 2, 13, 0.3),
                (2, 2005, 1, -10, 1.5),
                (3, 2004, 2, 54, -0.12)],
    dtype=[("id", "<i4"),("yr", "<u2"),("sex", "<u2"),("val1", "<i2"),("val2", "<f4")]
)

pl.DataFrame(
    {
        field_name: data[field_name]
        for field_name in data.dtype.fields
    }
)
┌─────┬──────┬─────┬──────┬───────┐
│ id  ┆ yr   ┆ sex ┆ val1 ┆ val2  │
│ --- ┆ ---  ┆ --- ┆ ---  ┆ ---   │
│ i32 ┆ u16  ┆ u16 ┆ i16  ┆ f32   │
╞═════╪══════╪═════╪══════╪═══════╡
│ 1   ┆ 2002 ┆ 2   ┆ 13   ┆ 0.3   │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2   ┆ 2005 ┆ 1   ┆ -10  ┆ 1.5   │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3   ┆ 2004 ┆ 2   ┆ 54   ┆ -0.12 │
└─────┴──────┴─────┴──────┴───────┘
  • Related