I am using hive .14 for a school project (so cant use a better version), and currently I analysing data taken from a uber in NY. I am using 2 tables: yellowtaxi22, for the information regarding the trip the uber did (in this case I will only use the pickupid and the dropoff id); and zoneinfo, that simply have the names of the pickup and dropoff id. I created a query to find what where the most common trips the uber took and then I created a view out of it(this being the commontrip), the problem now is that when I want to change the ids to their corresponding names, it only selects the trips where the pickupid=dropid. Example
pulocationid | dolocationid | count |
---|---|---|
12 | 34 | 20000 |
43 | 12 | 30000 |
12 | 12 | 100 |
5 | 91 | 40000 |
34 | 34 | 70000 |
the result should be
pulocationid | dolocationid | count |
---|---|---|
JFK airport | NV | 20000 |
Midtown | JFK airport | 30000 |
JFK airport | JFK airport | 100 |
Mountain Hill | LINCON SQR | 40000 |
NV | NV | 70000 |
But the result is
pulocationid | dolocationid | count |
---|---|---|
JFK airport | JFK airport | 100 |
NV | NV | 70000 |
this is the query that I am using, zoneinfo is the table that contains the location´s name (refered as zone) and its id (whcih can be used in either the pickup or the dropoff column (or both), common trips is the name of the view that I used to count the number of times a trip was repeated, this count is named as "trips" in the query:
select /* Mapjoin(commontrip)*/ zoneinfo.zone as pickup, zoneinfo.zone as dropoff, commontrip.trips
from zoneinfo join commontrip on
(zoneinfo.locationid=commontrip.pulocationid and zoneinfo.locationid=commontrip.dolocationid);
CodePudding user response:
You're comparing zoneinfo.locationid
to both commontrip.pulocationid
and commontrip.dolocationid
, meaning those must be equal with each other as well.
Try joining only one ID column at a time with your "label table"