I want to test that two Polars DataFame objects are equivalent, that contain a column which represents dates.
If I use datetime.date
from the standard library I don't have any problems:
import datetime as dt
import polars as pl
from polars.testing import assert_frame_equal
assert_frame_equal(pl.DataFrame({"foo": [1], "bar": [dt.date(2000, 1, 1)]}), pl.DataFrame({"foo": [1], "bar": [dt.date(2000, 1, 1)]}))
But if I try to use the Date
type from polars the comparison fails, with a PanicException: not implemented
exception.
assert_frame_equal(pl.DataFrame({"foo": [1], "bar": [pl.Date(2000, 1, 1)]}), pl.DataFrame({"foo": [1], "bar": [pl.Date(2000, 1, 1)]}))
Is there a way to use the polars Date
type in the DataFrame
and still be able to compare the two objects?
CodePudding user response:
I don't think you're supposed to use pl.Date
like that, otherwise your DataFrame
is of dtype object
, which is probably not what you wanted:
In [2]: pl.DataFrame({"foo": [1], "bar": [pl.Date(2000, 1, 1)]})
Out[2]:
shape: (1, 2)
┌─────┬─────────────────────────────────────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ object │
╞═════╪═════════════════════════════════════╡
│ 1 ┆ <polars.datatypes.Date object at... │
└─────┴─────────────────────────────────────┘
Instead, do:
df1 = pl.DataFrame({"foo": [1], "bar": ['2000-01-01']}).with_column(pl.col('bar').str.strptime(pl.Date))
df2 = pl.DataFrame({"foo": [1], "bar": ['2000-01-01']}).with_column(pl.col('bar').str.strptime(pl.Date))
assert_frame_equal(df1, df2)
and this works fine
CodePudding user response:
You should not put pl.Date
in a DataFrame
.
pl.Date
is a dtype
object and should not be used to instantiate values. The column type of the DataFrame
also indicates that polars does not know what to do with it, as it inferred it as object
dtype (read, I don't know what you gave me).
Using python's native datetime, date, time, timedelta
are the way to go.
import polars as pl
from polars.testing import assert_frame_equal
from datetime import date
assert_frame_equal(pl.DataFrame({"foo": [1], "bar": [date(2000, 1, 1)]}),
pl.DataFrame({"foo": [1], "bar": [date(2000, 1, 1)]}))