PostgreSQL SELECT result with a distinct ids that prefers specific values from another table-CodePudding

I have a some_dictionary table and a some_dictionary_language table with data in multiple languages(and a reference to some_dictionary).

I need to get unique values on some_dictionary.id preferring the result for the given language and if not the default.

Given sql:

create table some_dictionary(
    id bigserial primary key,
    some_text varchar(5),
    some_array integer[]
);

create table some_dictionary_language(
    id bigserial primary key,
    some_dictionary_id bigint not null REFERENCES some_dictionary(id),
    language varchar(64) not null,
    name varchar(128) not null
);

insert into some_dictionary (some_text, some_array)
values
    ('text2', '{1, 32, 2}'),
    ('text1', '{5, 9, 1}'),
    ('text4', '{1, 97, 4}'),
    ('text3', '{616, 1, 55}'),
    ('text5', '{8, 1}'),
    ('text6', '{1}');

insert into some_dictionary_language (some_dictionary_id, language, name)
values
    (2, 'POLISH', 'nazwa2'),
    (1, 'ENGLISH', 'name1'),
    (3, 'ENGLISH', 'name3'),
    (2, 'ENGLISH', 'name2'),
    (1, 'POLISH', 'nazwa1'),
    (1, 'SPANISH', 'nombre1'),
    (4, 'SPANISH', 'nombre1'),
    (5, 'ENGLISH', '5name'),
    (6, 'ENGLISH', '6name'),
    (6, 'POLISH', 'nazwa5'),
    (5, 'POLISH', 'nazwa6');

Given conditions params:

langugage = 'POLISH' or if not, default = 'ENGLISH'
phrase in some_text or name = 'na'
element in some_array = 1
page = 1 size = 10

My select statement without distinct:

select d.id, d.some_text, d.some_array, dl.name, dl.language  
from some_dictionary d 
join some_dictionary_language dl on d.id = dl.some_dictionary_id
where dl."language" in ('POLISH', 'ENGLISH')
and (d.some_text ilike '%na%' or dl.name ilike '%na%')
and 1 = ANY(d.some_array)

select result:

d.id    d.some_text d.some_array    dl.name     dl.lanugage
2   text1       {5,9,1}     nazwa2          POLISH
1   text2       {1,32,2}    name1       ENGLISH
3   text4       {1,97,4}    name3       ENGLISH
2   text1       {5,9,1}     name2       ENGLISH
1   text2       {1,32,2}    nazwa1          POLISH
5   text5       {8,1}       5name       ENGLISH
6   text6       {1}     6name       ENGLISH
6   text6       {1}     nazwa5          POLISH
5   text5       {8,1}       nazwa6          POLISH

expected select result with distinct on d.id and prefered language POLISH else default ENGLISH:

d.id    d.some_text d.some_array    dl.name     dl.lanugage
1   text2       {1,32,2}    nazwa1          POLISH
2   text1       {5,9,1}     name2       POLISH
3   text4       {1,97,4}    name3       ENGLISH (default!)
5   text5       {8,1}       nazwa6          POLISH
6   text6       {1}     6name       POLISH

I tried to do something like this:

select distinct on (id) * from (
    select d.id, d.some_text, d.some_array, dl.name, dl.language  
    from some_dictionary d 
    join some_dictionary_language dl on d.id = dl.some_dictionary_id
    where dl."language" in ('POLISH', 'ENGLISH')
    and (d.some_text ilike '%na%' or dl.name ilike '%na%')
    and 1 = ANY(d.some_array)
    order by case when dl."language" = 'POLISH' then 1 end
) sub offset 0 row fetch next 10 rows only;

but it did not work properly:

d.id    d.some_text d.some_array    dl.name     dl.lanugage
1   text2       {1,32,2}    nazwa1          POLISH
2   text1       {5,9,1}     name2       ENGLISH
3   text4       {1,97,4}    name3       ENGLISH
5   text5       {8,1}       nazwa6          POLISH
6   text6       {1}     6name       ENGLISH

CodePudding user response：

The ORDER BY that determine how the DISTINCT ON selects its record should be on the same level as the DISTINCT ON itself, not in a subquery.

If you did that, the error message would tell you the problem, you have to sort first by the DISTINCT ON columns, then the tie-breaker columns after. Like this:

select distinct on (id) d.id, d.some_text, d.some_array, dl.name, dl.language  
from some_dictionary d 
join some_dictionary_language dl on d.id = dl.some_dictionary_id
where dl."language" in ('POLISH', 'ENGLISH')
and (d.some_text ilike '%na%' or dl.name ilike '%na%')
and 1 = ANY(d.some_array)
order by id, case when dl."language" = 'POLISH' then 1 end;

When the DISTINCT ON query doesn't have an ORDER BY, it just makes one up consisting of only the DISTINCT ON columns, leaving the kept row within each group to be selected arbitrarily.