Home > Blockchain >  Convert Oracle to Hive - SubQuery can contain only 1 item in Select List
Convert Oracle to Hive - SubQuery can contain only 1 item in Select List

Time:08-24

Code and sample data to try on your system:

CREATE TABLE "EMP" 
   (    "DR_SID" NUMBER, 
    "DR_NAME" VARCHAR2(50 BYTE) COLLATE "USING_NLS_COMP", 
    "ACTIVE_FLAG" VARCHAR2(1 BYTE) COLLATE "USING_NLS_COMP", 
    "LAST_UPDATED_TIME" TIMESTAMP (6), 
    "DATA_SOURCE" VARCHAR2(100 BYTE) COLLATE "USING_NLS_COMP", 
    "ROW_LIMIT" VARCHAR2(50 BYTE) COLLATE "USING_NLS_COMP", 
    "VERSION#" NUMBER, 
    "PARENT_DR_SID" NUMBER
   ) 
   ;
   
REM INSERTING into EMP
SET DEFINE OFF;
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (1,'this should not come1',to_timestamp('18-APR-20 05.05.52.425734000 AM','DD-MON-RR HH.MI.SSXFF AM'),1,1);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (2,'come',to_timestamp('19-SEP-20 07.18.56.271199000 AM','DD-MON-RR HH.MI.SSXFF AM'),1,2);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (3,'come123',to_timestamp('13-FEB-21 05.05.51.645956000 AM','DD-MON-RR HH.MI.SSXFF AM'),1,3);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (4,'come456',to_timestamp('13-FEB-21 05.05.51.951505000 AM','DD-MON-RR HH.MI.SSXFF AM'),1,4);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (5,'this should not come2',to_timestamp('18-APR-20 05.05.52.425734000 AM','DD-MON-RR HH.MI.SSXFF AM'),2,1);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (6,'this should COME',to_timestamp('18-APR-20 05.05.52.425734000 AM','DD-MON-RR HH.MI.SSXFF AM'),3,1);


SELECT DR_SID, DR_NAME, LAST_UPDATED_TIME, VERSION#, PARENT_DR_SID FROM emp ;

the below query needs to be converted into Hive, can someone help?

SELECT DR_SID, DR_NAME, LAST_UPDATED_TIME, VERSION#, PARENT_DR_SID FROM emp t
where (version#,parent_dr_sid)
in (select max(version#),parent_dr_sid from emp group by parent_dr_sid)
;

I am try to find out which record is latest, so am using version# column (if version# column has the max value then the record is latest and its previous records are old and not to display).

Now how the records are linked with each other, so we have two columns, dr_sid is pk and parent_dr_sid contains same value to show this record is linked with which old record.

you can see the example here, in the given sample code, dr_sid = 1 is present 3 times in parent_dr_sid, all these 3 records of parent_dr_sid have the same value as 1 (which is linked to dr_sid).

Now I want the below o/p, can you do the same in hive?

FYI - we cant update the table so trying to update the record in this way and fetching in this way.

  
DR_SID, DR_NAME, LAST_UPDATED_TIME, VERSION#, PARENT_DR_SID
1   this should not come1   18-APR-20 05.05.52.425734000 AM 1   1
2   come    19-SEP-20 07.18.56.271199000 AM 1   2
3   come123 13-FEB-21 05.05.51.645956000 AM 1   3
4   come456 13-FEB-21 05.05.51.951505000 AM 1   4
5   this should not come2   18-APR-20 05.05.52.425734000 AM 2   1
6   this should COME    18-APR-20 05.05.52.425734000 AM 3   1

CodePudding user response:

Use left semi join to emulate in semantics for tuples.

with input as (
  select inline(array(
    (1,1),
    (1,2),
    (2,1),
    (2, 2)
  )) as (c1, c2)
)
, flt as (
  select inline(array(
    (1,1),
    (2, 2)
  )) as (f1, f2)
)
select *, split(version(), ' ')[0] as v
from input
  left semi join flt
  on input.c1 = flt.f1
  and input.c2 = flt.f2
input.c1 input.c2 v
1 1 3.1.3000.7.1.7.0-551
2 2 3.1.3000.7.1.7.0-551

CodePudding user response:

I don't know Hive so these might well be completely useless suggestions; however, see if it helps.

If you can use a subquery in FROM clause, you might do the following:

select e.*
from emp e join (select max(a.create_tm) create_tm, a.open_dt
                 from emp a group by a.open_dt
                ) x
             on x.create_tm = e.create_tm 
            and x.open_dt = e.open_dt;

Or, make your subquery return a single column by concatenating values. They look like "time" and "date" (I don't know their datatypes so you might need to apply e.g. TO_CHAR function to these columns; no problem in that, as long as it returns desired result):

select *
from emp
where concat(create_tm, open_dt) in (select concat(max(create_tm), open_dt)
                                     from emp
                                     group by open_dt);

CodePudding user response:

this is working:

SELECT DR_SID, DR_NAME, LAST_UPDATED_TIME, VERSION#--, PARENT_DR_SID 
FROM emp t join 
(select max(version#) v,parent_dr_sid from emp group by parent_dr_sid) t2
on t.version#=t2.v and t.parent_dr_sid = t2.parent_dr_sid
;
  • Related