Code and sample data to try on your system:
CREATE TABLE "EMP"
( "DR_SID" NUMBER,
"DR_NAME" VARCHAR2(50 BYTE) COLLATE "USING_NLS_COMP",
"ACTIVE_FLAG" VARCHAR2(1 BYTE) COLLATE "USING_NLS_COMP",
"LAST_UPDATED_TIME" TIMESTAMP (6),
"DATA_SOURCE" VARCHAR2(100 BYTE) COLLATE "USING_NLS_COMP",
"ROW_LIMIT" VARCHAR2(50 BYTE) COLLATE "USING_NLS_COMP",
"VERSION#" NUMBER,
"PARENT_DR_SID" NUMBER
)
;
REM INSERTING into EMP
SET DEFINE OFF;
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (1,'this should not come1',to_timestamp('18-APR-20 05.05.52.425734000 AM','DD-MON-RR HH.MI.SSXFF AM'),1,1);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (2,'come',to_timestamp('19-SEP-20 07.18.56.271199000 AM','DD-MON-RR HH.MI.SSXFF AM'),1,2);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (3,'come123',to_timestamp('13-FEB-21 05.05.51.645956000 AM','DD-MON-RR HH.MI.SSXFF AM'),1,3);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (4,'come456',to_timestamp('13-FEB-21 05.05.51.951505000 AM','DD-MON-RR HH.MI.SSXFF AM'),1,4);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (5,'this should not come2',to_timestamp('18-APR-20 05.05.52.425734000 AM','DD-MON-RR HH.MI.SSXFF AM'),2,1);
Insert into EMP (DR_SID,DR_NAME,LAST_UPDATED_TIME,VERSION#,PARENT_DR_SID) values (6,'this should COME',to_timestamp('18-APR-20 05.05.52.425734000 AM','DD-MON-RR HH.MI.SSXFF AM'),3,1);
SELECT DR_SID, DR_NAME, LAST_UPDATED_TIME, VERSION#, PARENT_DR_SID FROM emp ;
the below query needs to be converted into Hive, can someone help?
SELECT DR_SID, DR_NAME, LAST_UPDATED_TIME, VERSION#, PARENT_DR_SID FROM emp t
where (version#,parent_dr_sid)
in (select max(version#),parent_dr_sid from emp group by parent_dr_sid)
;
I am try to find out which record is latest, so am using version# column (if version# column has the max value then the record is latest and its previous records are old and not to display).
Now how the records are linked with each other, so we have two columns, dr_sid is pk
and parent_dr_sid
contains same value to show this record is linked with which old record.
you can see the example here, in the given sample code, dr_sid = 1
is present 3 times in parent_dr_sid, all these 3 records
of parent_dr_sid
have the same value as 1 (which is linked to dr_sid).
Now I want the below o/p, can you do the same in hive?
FYI - we cant update the table so trying to update the record in this way and fetching in this way.
DR_SID, DR_NAME, LAST_UPDATED_TIME, VERSION#, PARENT_DR_SID
1 this should not come1 18-APR-20 05.05.52.425734000 AM 1 1
2 come 19-SEP-20 07.18.56.271199000 AM 1 2
3 come123 13-FEB-21 05.05.51.645956000 AM 1 3
4 come456 13-FEB-21 05.05.51.951505000 AM 1 4
5 this should not come2 18-APR-20 05.05.52.425734000 AM 2 1
6 this should COME 18-APR-20 05.05.52.425734000 AM 3 1
CodePudding user response:
Use left semi join
to emulate in
semantics for tuples.
with input as (
select inline(array(
(1,1),
(1,2),
(2,1),
(2, 2)
)) as (c1, c2)
)
, flt as (
select inline(array(
(1,1),
(2, 2)
)) as (f1, f2)
)
select *, split(version(), ' ')[0] as v
from input
left semi join flt
on input.c1 = flt.f1
and input.c2 = flt.f2
input.c1 | input.c2 | v |
---|---|---|
1 | 1 | 3.1.3000.7.1.7.0-551 |
2 | 2 | 3.1.3000.7.1.7.0-551 |
CodePudding user response:
I don't know Hive so these might well be completely useless suggestions; however, see if it helps.
If you can use a subquery in FROM
clause, you might do the following:
select e.*
from emp e join (select max(a.create_tm) create_tm, a.open_dt
from emp a group by a.open_dt
) x
on x.create_tm = e.create_tm
and x.open_dt = e.open_dt;
Or, make your subquery return a single column by concatenating values. They look like "time" and "date" (I don't know their datatypes so you might need to apply e.g. TO_CHAR
function to these columns; no problem in that, as long as it returns desired result):
select *
from emp
where concat(create_tm, open_dt) in (select concat(max(create_tm), open_dt)
from emp
group by open_dt);
CodePudding user response:
this is working:
SELECT DR_SID, DR_NAME, LAST_UPDATED_TIME, VERSION#--, PARENT_DR_SID
FROM emp t join
(select max(version#) v,parent_dr_sid from emp group by parent_dr_sid) t2
on t.version#=t2.v and t.parent_dr_sid = t2.parent_dr_sid
;