SQLAlchemy - limit the joinedloaded results-CodePudding

Models:

class Team(Base):
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)

    players = relationship("Player", backref="team")

class Player(Base):
    id = Column(Integer, primary_key=True)
    name = Column(String(255), unique=True)
    team_id = Column(Integer, ForeignKey("team.id"))
    positions = relationship("Position", backref="player")


class Position(Base):
    id = Column(Integer(), primary_key=True)
    name = Column(String(255), unique=True)
    player_id = Column(Integer, ForeignKey("player.id"))
    goals = relationship("Goal", backref="position")


class Goal(Base):
    id = Column(Integer(), primary_key=True)
    distance = Column(Integer)
    position_id = Column(Integer, ForeignKey("position.id"))

# Query to get all goals of all players of a team
query = (
    select(Team)
    .select_from(Player, Position, Goal)
    .options(joinedload(Team.players))
    .options(
        joinedload(
            Team.players,
            Player.positions,
        )
    )
    .options(
        joinedload(
            Team.players,
            Player.positions,
            Position.goals,
        )
    )
result = await db.execute(query)
response = result.scalar()

But now I need to limit the query to last goal of each player, instead of all goals of that player.

So I tried,

subquery = (
    select(Goal)
    .order_by(Goal.id.desc())
    .limit(1)
    .subquery()
    .lateral()
)

query = (
    select(Team)
    .select_from(Player, Position, Goal)
    .options(joinedload(Team.players))
    .options(
        joinedload(
            Team.players,
            Player.positions,
        )
    )
    .outerjoin(subquery)
    .options(
        contains_eager(
            Team.players,
            Player.positions,
            Position.goals,
            alias=subquery,
        )
    )
result = await db.execute(query)
response = result.scalar()

This fetches the last goal of any player, but not the last goal of respective player.

Filters like Goal.position_id == Position.id in outerjoin or subquery either does not work or result in error.

Edit:

It looks like I need populate_existing(), but it's not available in new select method.

CodePudding user response：

Have a look into using RANK, it may do what you need, though it would require a few queries/subqueries instead of one big joinedload.

I'd have a subquery to rank the goal dates and partition them by the positions or players, and filter it to where rank equals 1. That will get you the latest goal for each position, which you can create a dict for. With your main query, you can then use the position ID to find the latest goal using that dict.

Something like this:

# Rank goals by id and position
subquery = select(
    Goal.id.label('goal_id'),
    Goal.position_id,
    func.rank().over(order_by=Goal.id.desc(), partition_by(Goal.position_id)).label('rank'),
).subquery()

# Create dict of {position_id: latest_goal_id} to use as a lookup
latest_goal_query = (
    select(subquery.c.goal_id, subquery.c.position_id)
    .where(subquery.c.rank == 1)
)
latest_goal_ids = {pos_id: goal_id for goal_id, pos_id in session.execute(latest_goals).fetchall()}

# Get goal objects from the IDs
goal_query = select(Goal).where(Goal.id.in_(latest_goals.values()))
goals = {goal.id: goal for goal in session.execute(goal_query).scalars()}

# Map position ID to the latest goal object
latest_goals = {pos_id: goals[goal_id] for pos_id, goal_id in latest_goal_ids.items()}

# Read the team and position, and you can use the position_id to get the latest goal
query = ...

As a heads up btw - I used to attempt joinedload on everything until the author of SQLAlchemy told me that selectinload should be used when possible, because it fetches only the data you need, whereas joins may have a ton of duplicate data (eg. if your team has 20 players with 5 positions each and 20 goals each, then I think joining it all will result in each team name being sent 2000 times, and each player name being sent 100 times).

Edit: column_property just came to mind as an alternative solution. Unfortunately I've never been able to figure how to map the actual Goal model, so this isn't perfect, but here's an example of how you could add the ID of the latest goal directly to the Player model.

class Player(Base):
    ...
    latest_goal_id = column_property(
        select(Goal.id)
        .where(Goal.position.has(Position.player_id == id)),
        .order_by(Goal.id.desc()).limit(1)
    )

From the point of view of a query, it's just treated as another column, so you can select and filter by it.

CodePudding user response：

I think what you want can be achieved by using a DISTINCT ON clause to remove the duplicate rows retrieved from the Goal object:

query = (
    # Select from Goal and join all the required tables
    select(Goal)
    .join(Goal.position)
    .join(Position.player)
    .join(Player.team)
    # Remove duplicate rows based on the Player id
    .distinct(Player.id)   
    # Order by `Player.id` (required for distinct) and descending on the goal_id to have the latest added goals (newest) first
    .order_by(Player.id, Goal.id.desc())   
    )