Home > Back-end >  HIVE CBO. Wrong results with Hive SQL query with MULTIPLE IN conditions in where clause
HIVE CBO. Wrong results with Hive SQL query with MULTIPLE IN conditions in where clause

Time:05-10

I am running one SQL query in Hive and it gives different results with CBO enabled and disabled. The results are wrong when CBO is enabled (set hive.cbo.enable=true;).

Prerequisites: Apache Hadoop 2.10.1 Apache Hive 2.3.6 installed. (I tried to reproduce the issue with Apache Hive 3 version and Hadoop 3 version and they work fine.)

Actions to reproduce:

1) Create the table in Hive

CREATE TABLE sample( fname string, sname string, sid string) ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES ( 'transient_lastDdlTime'='1616020251');

2) Insert some records in the table the below values

insert into sample values ("PQ", "F", "33");

insert into sample values ("RK", "A", "11");

insert into sample values ("AZ", "B", "22");

3) Check the table

select * from sample;

PQ F 33
RK A 11
AZ B 22

4) Run the below query. It should give 0 records, instead it gives all records

select fname from sample where sid in ('11') and sid in ('22');

PQ
RK
AZ

5) The results are good when we disable cbo in Hive

set hive.cbo.enable=false;

select fname from sample where sid in ('11') and sid in ('22');

OK Time taken: 0.131 seconds

CodePudding user response:

Datatype is String use INT

And the simplest solution is do not use CBO is problem still remains but if you still want to use check these things after altering the datatype

1.Privilages 2.Version control 3.auto tuned

CodePudding user response:

Use in query to select multiple item

select fname from sample where sid in ('11','22');

  • Related