Activity table:
----------- ----------- ------------ --------------
| player_id | device_id | event_date | games_played |
----------- ----------- ------------ --------------
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 1 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
----------- ----------- ------------ --------------
Output:
----------- ------------ ---------------------
| player_id | event_date | games_played_so_far |
----------- ------------ ---------------------
| 1 | 2016-03-01 | 5 |
| 1 | 2016-05-02 | 11 |
| 1 | 2017-06-25 | 12 |
| 3 | 2016-03-02 | 0 |
| 3 | 2018-07-03 | 5 |
----------- ------------ ---------------------
Query:
select a1.player_id, a1.event_date, sum(a2.games_played) as games_played_so_far
from activity as a1
inner join activity as a2
on a1.event_date >= a2.event_date
and a1.player_id = a2.player_id
group by a1.player_id, a1.event_date
I am not able to understand how this self inner join works. Can anyone explain it a bit. I saw few videos but not helpful so far.
CodePudding user response:
when you select the data without the group by you can see better what it does-
For every row in a2 it selects all rows from a1 that fit the criteria a1.event_date >= a2.event_date and a1.player_id = a2.player_id
and joins them together
The group by only sums all rows in a2 up that have the same player id and a1.eventdate
What you so get is a Cumulative sum.
in MySQL 8 was this made easier with window functions.
select a1.player_id, a1.event_date,a2.player_id, a2.event_date from activity as a1 inner join activity as a2 on a1.event_date >= a2.event_date and a1.player_id = a2.player_id
player_id | event_date | player_id | event_date --------: | :--------- | --------: | :--------- 1 | 2017-06-25 | 1 | 2016-03-01 1 | 2016-05-02 | 1 | 2016-03-01 1 | 2016-03-01 | 1 | 2016-03-01 1 | 2017-06-25 | 1 | 2016-05-02 1 | 2016-05-02 | 1 | 2016-05-02 1 | 2017-06-25 | 1 | 2017-06-25 3 | 2018-07-03 | 3 | 2016-03-02 3 | 2016-03-02 | 3 | 2016-03-02 3 | 2018-07-03 | 3 | 2018-07-03
select a1.player_id, a1.event_date, sum(a2.games_played) as games_played_so_far from activity as a1 inner join activity as a2 on a1.event_date >= a2.event_date and a1.player_id = a2.player_id group by a1.player_id, a1.event_date ORDER BY a1.player_id, a1.event_date
player_id | event_date | games_played_so_far --------: | :--------- | ------------------: 1 | 2016-03-01 | 5 1 | 2016-05-02 | 11 1 | 2017-06-25 | 12 3 | 2016-03-02 | 0 3 | 2018-07-03 | 5
db<>fiddle here