I am running one SQL query in Hive and it gives different results with CBO enabled and disabled. The results are wrong when CBO is enabled (set hive.cbo.enable=true;).
Prerequisites: Apache Hadoop 2.10.1 Apache Hive 2.3.6 installed. (I tried to reproduce the issue with Apache Hive 3 version and Hadoop 3 version and they work fine.)
Actions to reproduce:
1) Create the table in Hive
CREATE TABLE
sample
(fname
string,sname
string,sid
string) ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES ( 'transient_lastDdlTime'='1616020251');
2) Insert some records in the table the below values
insert into sample values ("PQ", "F", "33");
insert into sample values ("RK", "A", "11");
insert into sample values ("AZ", "B", "22");
3) Check the table
select * from sample;
PQ F 33
RK A 11
AZ B 22
4) Run the below query. It should give 0 records, instead it gives all records
select fname from sample where sid in ('11') and sid in ('22');
PQ
RK
AZ
5) The results are good when we disable cbo in Hive
set hive.cbo.enable=false;
select fname from sample where sid in ('11') and sid in ('22');
OK Time taken: 0.131 seconds
CodePudding user response:
Datatype is String use INT
And the simplest solution is do not use CBO is problem still remains but if you still want to use check these things after altering the datatype
1.Privilages 2.Version control 3.auto tuned
CodePudding user response:
Use in query to select multiple item
select fname from sample where sid in ('11','22');