I want to use PostgreSQL's functionality in order to obtain the row that has the longest path in either ind
or tar
column, for the same id
, ind_id
and tar_id
.
For example:
At the table below, since rows 1, 2, 3 have the same id=1
, ind_id=2
and tar_id=3
, I want to only take the 3rd row because the array tar={7,8,9}
, is longer than {7,8}
(2nd row) and {7}
(1st row).
Similarly, rows #4 and #5, have the same values for id
, ind_id
and tar_id
, and since the length of the array in column ind
is longer in row #5, I want to keep this row only, and drop row #4.
Since I am a newbie in the world of PostgreSQL, can someone give me a hint how to do that?
text id ind_id tar_id ind tar
1 bla bla 1 2 3 {4} {7}
2 bla bla 1 2 3 {4} {7,8}
3 bla bla 1 2 3 {4} {7,8,9}
4 bla bla 1 2 33 {5} {10}
5 bla bla 1 2 33 {5,6,7} {10}
CodePudding user response:
Use distinct on
and order by id, ind_id, tar_id
and the greatest array length of ind
and tar
.
select distinct on (id, ind_id, tar_id)
"text", ind_id, tar_id , ind, tar, pmid
from the_table
order by id, ind_id, tar_id,
greatest(array_length(ind, 1), array_length(tar, 1)) desc;