Home > Blockchain >  Compare Polars DataFrames That Have a Polars Date Colums
Compare Polars DataFrames That Have a Polars Date Colums

Time:01-11

I want to test that two Polars DataFame objects are equivalent, that contain a column which represents dates.

If I use datetime.date from the standard library I don't have any problems:

import datetime as dt

import polars as pl
from polars.testing import assert_frame_equal

assert_frame_equal(pl.DataFrame({"foo": [1], "bar": [dt.date(2000, 1, 1)]}), pl.DataFrame({"foo": [1], "bar": [dt.date(2000, 1, 1)]}))

But if I try to use the Date type from polars the comparison fails, with a PanicException: not implemented exception.

assert_frame_equal(pl.DataFrame({"foo": [1], "bar": [pl.Date(2000, 1, 1)]}), pl.DataFrame({"foo": [1], "bar": [pl.Date(2000, 1, 1)]}))

Is there a way to use the polars Date type in the DataFrame and still be able to compare the two objects?

CodePudding user response:

I don't think you're supposed to use pl.Date like that, otherwise your DataFrame is of dtype object, which is probably not what you wanted:

In [2]: pl.DataFrame({"foo": [1], "bar": [pl.Date(2000, 1, 1)]})
Out[2]:
shape: (1, 2)
┌─────┬─────────────────────────────────────┐
│ foo ┆ bar                                 │
│ --- ┆ ---                                 │
│ i64 ┆ object                              │
╞═════╪═════════════════════════════════════╡
│ 1   ┆ <polars.datatypes.Date object at... │
└─────┴─────────────────────────────────────┘

Instead, do:

df1 = pl.DataFrame({"foo": [1], "bar": ['2000-01-01']}).with_column(pl.col('bar').str.strptime(pl.Date))
df2 = pl.DataFrame({"foo": [1], "bar": ['2000-01-01']}).with_column(pl.col('bar').str.strptime(pl.Date))

assert_frame_equal(df1, df2)

and this works fine

CodePudding user response:

You should not put pl.Date in a DataFrame.

pl.Date is a dtype object and should not be used to instantiate values. The column type of the DataFrame also indicates that polars does not know what to do with it, as it inferred it as object dtype (read, I don't know what you gave me).

Using python's native datetime, date, time, timedelta are the way to go.

import polars as pl
from polars.testing import assert_frame_equal
from datetime import date

assert_frame_equal(pl.DataFrame({"foo": [1], "bar": [date(2000, 1, 1)]}), 
                   pl.DataFrame({"foo": [1], "bar": [date(2000, 1, 1)]}))
  • Related