I'm following one of BQ courses from Google's Skill Boost program. Using a dataset with football (soccer) stats, they're calculating the impact of shot distance on the likelihood of scoring a goal.
I don't quite get how the shot distance is calculated in this part:
SQRT(
POW(
(100 - positions[ORDINAL(1)].x) * 105/100,
2)
POW(
(50 - positions[ORDINAL(1)].y) * 68/100,
2)
) AS shotDistance
I know the distance formula is used (d=√((x_2-x_1)² (y_2-y_1)²)) but:
- why use ORDINAL(1)? How does it work in this example?
- why detract first from 100 and then from 50?
For the record, positions
is a repeated field, with x,y int64 nested underneath. x and y have values between 1 and 100, demonstrating the % of the pitch where an event (e.g. a pass) was initiated or terminated.
The whole code is as follows:
WITH
Shots AS
(
SELECT
*,
/* 101 is known Tag for 'goals' from goals table */
(101 IN UNNEST(tags.id)) AS isGoal,
/* Translate 0-100 (x,y) coordinate-based distances to absolute positions
using "average" field dimensions of 105x68 before combining in 2D dist calc */
SQRT(
POW(
(100 - positions[ORDINAL(1)].x) * 105/100,
2)
POW(
(50 - positions[ORDINAL(1)].y) * 68/100,
2)
) AS shotDistance
FROM
`soccer.events`
WHERE
/* Includes both "open play" & free kick shots (including penalties) */
eventName = 'Shot' OR
(eventName = 'Free Kick' AND subEventName IN ('Free kick shot', 'Penalty'))
)
SELECT
ROUND(shotDistance, 0) AS ShotDistRound0,
COUNT(*) AS numShots,
SUM(IF(isGoal, 1, 0)) AS numGoals,
AVG(IF(isGoal, 1, 0)) AS goalPct
FROM
Shots
WHERE
shotDistance <= 50
GROUP BY
ShotDistRound0
ORDER BY
ShotDistRound0
Thanks
CodePudding user response:
why use ORDINAL(1)? How does it work in this example?
As per the BigQuery array documentation
To access elements from the arrays in this column, you must specify which type of indexing you want to use: either OFFSET, for zero-based indexes, or ORDINAL, for one-based indexes.
So taking a sample array to access the first element you would do the following:
array = [7, 5, 8]
array[OFFSET(0)] = 7
array[ORDINAL(1)] = 7
So in this example it is used to get the coordinates of where the shot took place (which in this data is the first set of x,y coordinates).
why detract first from 100 and then from 50?
The difference between 100 and 50 represents the position of the goals on the field.
So the end point of the shot is assumed to be in the middle of the goals which along the x axis from 0 - 100, 100 is the endline of the field, while on the y axis the goals is in the middle of the field equal distance from each sideline, so therefore 50 is the middle point of the goals.