I have to filter my products by the status name. I have a subquery to get the last status for each product (the last status is the only one relevant for my usecase).
I want to get Product
whose last status matches a list of status names (statuses
in my example).
Models:
class Product(BaseModel):
__tablename__ = "products"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4)
statuses = relationship("Status", back_populates="product")
class Status(BaseModel):
__tablename__ = "status"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4)
name = Column(String(255), nullable=False)
created_at = Column(DateTime, server_default=functions.now(), index=True)
Query:
subquery = (
select(Status.id)
.filter(
Status.product_id == Product.id,
)
.order_by(Status.created_at.desc())
.limit(1)
.correlate(Status)
.subquery()
)
I want to filter on status names with the following code:
statement = (
select(Product.id)
.outerjoin(Status)
.filter(
Status.id.in_(subquery),
Status.name.in_(statuses),
)
)
But unfortunately, I got products with the last status not matching my statuses
.
I think this is because I wrote my SQLAlchemy request like
Give me products with status matching the
statuses
and it looks to the last status matching the statuses
,
instead of
Give me products with THE LAST status matching the
statuses
for it to look to the last status AND check if this status match the statuses
.
Why this result?
CodePudding user response:
What you need in your subquery is a GROUP BY
clause, so that the subquery contains the latest status for every product, rather than only having the latest status overall (which is what was happening in your subquery when you used LIMIT 1
). Then you do your join and filter for only the status values that you want.
import sqlalchemy as sa
subquery = (
select(Status)
.order_by(Status.created_at.desc())
.group_by(Status.product_id)
.subquery()
)
active_products_query = (
select(Product.id)
.join(subquery, Product.id == subquery.c.product_id)
.filter(subquery.c.name.in_(["active"]))
)
print(session.execute(active_products_query).scalars().all())
CodePudding user response:
Option-1: One way would be to add a JOIN
with the filter which will select only the latest Status
:
query = (
select(Product)
.innerjoin(
Status,
Status.id == (
select(Status.id)
.filter(Status.product_id == Product.id)
.order_by(Status.created_at.desc())
.limit(1)
.scalar_subquery()
.correlate(Product)
)
)
.filter(Status.name.in_(statuses))
)
Option-2: Add computed_column
for last_status_name
First, add this property to the Product
:
Product.last_status_name = column_property(
select(Status.name.label("last_status_name"))
.filter(Status.product_id == Product.id)
.order_by(Status.created_at.desc())
.limit(1)
.scalar_subquery(),
deferred=True # prevent from always loading this from the database
)
Then you can use it whenever you need to simply by adding a filter to a query:
query = (
select(Product)
.filter(Product.last_status_name.in_(statuses))
)
And if you remote the deferred=True
above, you will always get it loaded from the database with each query.