looking for some help on how to create a game_id variable in SQL. I'm working on an nba project and I can manipulate the data into team, date, opponent format where every team has a row for each game they play. A game_id variable would make my life easier for other work in the project but I don't know how to create it.
The variable itself can start from 1 or 100000, doesn't matter. I just need it to uniquely identify every game that is being played.
Below is an example table data you can create to see my dilemma. Ideally the LAL and GSW rows would both have the same game_id, and the BKN and MIL rows would have the same game_id.
CREATE TABLE basketball_data (
team text,
dategame date,
opponent text
);
INSERT INTO basketball_data (team, dategame, opponent)
VALUES ('GSW', '2021-10-19', 'LAL');
INSERT INTO basketball_data (team, dategame, opponent)
VALUES ('LAL', '2021-10-19', 'GSW');
INSERT INTO basketball_data (team, dategame, opponent)
VALUES ('BKN', '2021-10-19', 'MIL');
INSERT INTO basketball_data (team, dategame, opponent)
VALUES ('MIL', '2021-10-19', 'BKN');
Anyone have an idea of what would be a way of creating a variable like this? If it makes a difference, I'm working in PostgreSQL. Thanks!
CodePudding user response:
You may try the following using DENSE_RANK as a window function:
Retrieving a game id during queries
SELECT
DENSE_RANK() OVER (
ORDER BY
dategame,(
CASE
WHEN team < opponent THEN CONCAT(team,opponent)
ELSE CONCAT(opponent,team)
END
)
) as game_id,
team,
dategame,
opponent
FROM
basketball_data;
game_id | team | dategame | opponent |
---|---|---|---|
1 | BKN | 2021-10-19T00:00:00.000Z | MIL |
1 | MIL | 2021-10-19T00:00:00.000Z | BKN |
2 | GSW | 2021-10-19T00:00:00.000Z | LAL |
2 | LAL | 2021-10-19T00:00:00.000Z | GSW |
Creating a new table with the same data and game id
CREATE TABLE basketball_data_with_game_id AS
SELECT
DENSE_RANK() OVER (
ORDER BY
dategame,(
CASE
WHEN team < opponent THEN CONCAT(team,opponent)
ELSE CONCAT(opponent,team)
END
)
) as game_id,
team,
dategame,
opponent
FROM
basketball_data;
There are no results to be displayed.
SELECT * FROM basketball_data_with_game_id;
game_id | team | dategame | opponent |
---|---|---|---|
1 | BKN | 2021-10-19T00:00:00.000Z | MIL |
1 | MIL | 2021-10-19T00:00:00.000Z | BKN |
2 | GSW | 2021-10-19T00:00:00.000Z | LAL |
2 | LAL | 2021-10-19T00:00:00.000Z | GSW |
Updating the existing table to have the game id
ALTER TABLE basketball_data
ADD game_id INT DEFAULT 0;
There are no results to be displayed.
UPDATE basketball_data
SET game_id = n.game_id
FROM (
SELECT
DENSE_RANK() OVER (
ORDER BY
dategame,(
CASE
WHEN team < opponent THEN CONCAT(team,opponent)
ELSE CONCAT(opponent,team)
END
)
) as game_id,
team,
dategame,
opponent
FROM
basketball_data
) n
WHERE basketball_data.game_id=0 AND
basketball_data.team=n.team AND
basketball_data.dategame=n.dategame AND
basketball_data.opponent=n.opponent;
There are no results to be displayed.
SELECT * FROM basketball_data;
team | dategame | opponent | game_id |
---|---|---|---|
BKN | 2021-10-19T00:00:00.000Z | MIL | 1 |
MIL | 2021-10-19T00:00:00.000Z | BKN | 1 |
GSW | 2021-10-19T00:00:00.000Z | LAL | 2 |
LAL | 2021-10-19T00:00:00.000Z | GSW | 2 |
Let me know if this works for you.