I have three columns, sending organization, receiving organization and number of partecipants. How can I show, for each sending organization, the receiving organization with biggest number of partecipants? Some rows have the same values for Sending and Receiving, so I have to sum the partecipants for that rows.
Es:
SENDING | RECEIVING | PARTECIPANTS |
---|---|---|
A | B | 10 |
A | C | 9 |
A | B | 2 |
B | C | 11 |
B | D | 4 |
B | D | 5 |
C | E | 1 |
Expected output | : |
---|---|
A | B |
B | C |
C | E |
CodePudding user response:
I think the best way will be to use a window function:
SELECT sending, FIRST_VALUE(receiving) OVER ( PARTITION BY sending ORDER BY partecipants DESC ) receiving FROM ttable GROUP BY sending
Actually I did not get the part of your question concerning some sums of values. Should you have to do them you can replace ttable
with a subquery (SELECT SUM(...) ....)
CodePudding user response:
Disclaimer: this query was tested in MySQL, not Hive.
You can do:
select *
from (
select sending, receiving,
rank() over(order by sum(partecipants) desc) as rk
from t
group by sending, receiving
) x
where rk = 1