Home > Blockchain >  SAME SQL regexp_extract, different impala and hive output. Why?
SAME SQL regexp_extract, different impala and hive output. Why?

Time:03-26

The same SQL command has two different output on Hive and Impala:

select regexp_extract('AbcffdBCdeffffGHI','.*?(f )',1);

Hive output: ff

Impala output: ffff

Why such difference? Please explain difference in terms of each engine's method of processing and outputting characters space-by-space, from left to right or right to left, step by step, and the reasoning, logic, and engines' coding. Of course, talking about difference needs to talk about "convention", too. What is the convention? Which of these output conforms to convention?

The SQL command:  select regexp_extract('AbcffdBCdeffffGHI','.*?(f )',1);

has been executed on Hive and Impala and output obtained as stated.

These places here have been searched and offer No explanation to the question asked.

Hive, enter image description here

..but I'd be keen to see Cloudera offer a more involved explanation as to why their Regex matching here is unconventional

  • Related