Home > Software engineering >  Why referencing table two times instead of just once
Why referencing table two times instead of just once

Time:03-03

I don't understand why in the below query there has to be reference to zones table twice (as zpu and zdo). When I try modified query with one "zones" table instead of splitting it into zpu and zdo I have different outcome. The outcome table has to have two columns, one with pickup location and second with dropoff location. Locations are contained withing "zones" table.

Can't figure out why is that and what am I lacking in SQL mechanics.

Tables:

Trips

tpep_pickup_datetime tpep_dropoff_datetime total_amount PULocationID DOLocationID
"2021-01-01 00:30:10" "2021-01-01 00:36:12" 11.8 142 43
"2021-01-01 00:51:20" "2021-01-01 00:52:19" 4.3 238 151
"2021-01-01 00:43:30" "2021-01-01 01:11:06" 51.95 132 165
"2021-01-01 00:15:48" "2021-01-01 00:31:01" 36.35 138 132
"2021-01-01 00:31:49" "2021-01-01 00:48:21" 24.36 68 33
"2021-01-01 00:16:29" "2021-01-01 00:24:30" 14.15 224 68
"2021-01-01 00:00:28" "2021-01-01 00:17:28" 17.3 95 157
"2021-01-01 00:12:29" "2021-01-01 00:30:34" 21.8 90 40

Zones

LocationID Borough Zone
1 EWR Newark Airport
2 Queens Jamaica Bay
3 Bronx Allerton/Pelham Gardens
4 Manhattan Alphabet City Yellow Zone
5 Staten Island Arden Heights
6 Staten Island Arrochar/Fort Wadsworth
7 Queens Astoria
8 Queens Astoria Park
9 Queens Auburndale
10 Queens Baisley Park

Query

select
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    CONCAT(zpu."Borough", ' / ', zpu."Zone") AS "pick_up_loc",
    CONCAT(zdo."Borough", ' / ', zpu."Zone") AS "drop_off_loc"
from
    trips t,
    zones zpu,
    zones zdo
WHERE
    t."PULocationID" = zpu."LocationID" AND
    t."DOLocationID" = zdo."LocationID"

Outcome (good table):

tpep_pickup_datetime tpep_dropoff_datetime total_amount pick_up_loc drop_off_loc
2021-01-01 00:30:10 2021-01-01 00:36:12 11.8 Manhattan / Lincoln Square East Manhattan / Lincoln Square East
2021-01-01 00:51:20 2021-01-01 00:52:19 4.3 Manhattan / Upper West Side North Manhattan / Upper West Side North
2021-01-01 00:43:30 2021-01-01 01:11:06 51.95 Queens / JFK Airport Brooklyn / JFK Airport
2021-01-01 00:15:48 2021-01-01 00:31:01 36.35 Queens / LaGuardia Airport Queens / LaGuardia Airport
2021-01-01 00:31:49 2021-01-01 00:48:21 24.36 Manhattan / East Chelsea Brooklyn / East Chelsea
2021-01-01 00:16:29 2021-01-01 00:24:30 14.15 Manhattan / Stuy Town/Peter Cooper Village Manhattan / Stuy Town/Peter Cooper Village
2021-01-01 00:00:28 2021-01-01 00:17:28 17.3 Queens / Forest Hills Queens / Forest Hills
2021-01-01 00:12:29 2021-01-01 00:30:34 21.8 Manhattan / Flatiron Brooklyn / Flatiron
2021-01-01 00:39:16 2021-01-01 01:00:13 28.8 Brooklyn / Fort Greene Queens / Fort Greene
2021-01-01 00:26:12 2021-01-01 00:39:46 18.95 Manhattan / Yorkville West Manhattan / Yorkville West

CodePudding user response:

You select trips and join addresses, and you wonder why you are forced to join the addresses again and why it doesn't suffice to tell the DBMS once to get you the addresses.

This misunderstanding is probably bolstered by the ancient join syntax you are using - a syntax used in the 1980s mainly until explicit joins made it into the SQL standard in 1992.

When you join tables, you actuall join table rows.

With select * from trips, zones you join every trip row with every zone row. With eight trip rows and 10 zones you get 80 result rows. So you joined all ten zones to each trip. With select * from trips t, zones zpu, zones zdo you get a result of 800 rows, each trip combined with all ten zones and these again combined with all ten zones, i.e. all 100 possible zone combinations per trip.

In your query you have WHERE t.pulocationid = zpu.locationid AND t.dolocationid = zdo.locationid. For one trip there is only one of its 100 intermediate result rows that match this criteria and the other 99 are dismissed.

Now, if you joined the zones table only once, you had ten rows per trip in your intermediate result, each row would contain one zone. Now to your criteria, there are two options: use AND or OR:

select * from trips t, zones z
where t."PULocationID" = z."LocationID" AND t."DOLocationID" = z."LocationID"

This would only select rows for which t.pulocationid = t.dolocationid, because the criteria demand them to both match the one zone in the joined row. In your sample data there is not a single trip where the two locations are equal. You would get an empty result set.

select * from trips t, zones z
where t."PULocationID" = z."LocationID" OR t."DOLocationID" = z."LocationID"

With this query you would get the row where the zone matches the t.pulocationid and the row where the zone matches the t.dolocationid. I.e. you would get two result rows per trip, one with the pickup zone, one with the drop off zone.

But you want one result row per trip with both zones in that same result row.

Explicit joins make this more readable:

select
  tpep_pickup_datetime,
  tpep_dropoff_datetime,
  total_amount,
  CONCAT(zpu."Borough", ' / ', zpu."Zone") AS "pick_up_loc",
  CONCAT(zdo."Borough", ' / ', zpu."Zone") AS "drop_off_loc"
from trips t
inner join zones zpu on zpu."LocationID" = t."PULocationID"
inner join zones zdo on zdo."LocationID" = t."DOLocationID";

This does exactly the same thing as your query, but it is easier to read. Take a trip, join the pickup location row, join the drop off location row.

  •  Tags:  
  • sql
  • Related