Home > other >  The hive and graphs
The hive and graphs

Time:09-28

Recently met an interview question:
When using HQL join, whether can produce graphs job, a join creates a graphs job?
Case 1:
 SELECT a.n ame, a.a ge FROM tableA a JOIN tableB b ON a.n ame=b.n ame 


If the join is repetitive conditions, will produce the corresponding many job?
Example 2:
 SELECT a.n ame, a.a ge FROM tableA a JOIN tableB b ON a.n ame=b.n ame JOIN tableB c ON a.n ame=c.n ame 

CodePudding user response:

In general, an input source (table) there will be a Map (in the example Reduce depends on the context, you will not Reduce),
Equation under the join table on A join table B a.n ame=b.n ame, AB two table MAP respectively into the name - & gt; Row of kv format, in the third to Join the MAP, namely the name - & gt; A.r ow, b.r ow format, and then execute the SELECT, if there are GROUP BY the Reduce, then SELECT,
Each of the graphs is called a Stage, there will be three stages in case 1, 2 input Stage, a Join Stage,
Case 2 will have four stages, 3 input Stage, a Join Stage,
Complex join and group by/XXXX by, will involve the Shuffle and Combine,
In addition Hive will also optimize the query plan according to certain rules, in order to reduce graphs (Stage), the number of
Can perform a SQL, then at every Stage of graphs assignments Yarn above observation, to understand,
  • Related