Home > front end >  Rows already alphabetically ordered in postgresql. How to use this information in a search?
Rows already alphabetically ordered in postgresql. How to use this information in a search?

Time:12-01

I have a postgresql table "people" with a column called "full_names" (VARCHAR(150)). These full_names I save them already alphabetically (by word order) ordered like this and keep the original order in another column:

    1. "James Henry Carl Dumas" --> "Carl Dumas Henry James"
    1. "Al Peter Karl Hito" --> "Al Hito Karl Peter"

If on top of that I sort the rows alphabetically -->

  • 1'. "Al Peter Karl Hito" --> "Al Hito Karl Peter"
  • 2'. "James Henry Carl Dumas" --> "Carl Dumas Henry James"

I basically have a lot of information about the content of the table which I can use to make fast searches of people: I want for example that the next input INPUT: "Henry Jam" or "Dumas Car" will give me a match.

In a programming language like "python" I will easily build some algorithm that easily discards a big chunk of the table and then another little chunk because such algorithm will know how the rows and the whole table is sorted, however I have no idea how to communicate POSTGRESQL this information so that it will use it.

I only found about an INDEX in the column "full_names" (the only relevant column) which will order the table rows in alphabetical order but I don't know how to communicate about the in-row order of words.

Is it possible to use some smart postgresql feature or to write the search algorithm in postgresql directly myself?

I'v researched about index, fulltext or writing sql code directly.

CodePudding user response:

  1. Once you save text values (fields, cells) that are internally sorted, they'll internally remain that way in the table.

  2. When you create an index, the table won't just "get sorted" but rather the column you use as the index key will be used to build a separate structure, a binary tree. From that point, when you query the table using that column to filter, sort, etc. PostgreSQL will detect it and instead of scanning the table, it'll cllimb/descend the tree structure, which is faster. If you won't target any other columns, it'll be even faster, because it'll retrieve the value straight from the index, without having to jump from it to the table to collect anything else.

    In order to build that index, PostgreSQL analyses the entirety of those values, along with their internals - after all, it has to be able to tell 'ab' from 'aa'. It's already informed about the "in-row order of words".

  3. If you plan to target individual parts of your values, split them:

    create table people(id serial primary key,full_name text);
    insert into people (full_name) values 
       ('James Henry Carl Dumas'),
       ('Al Peter Karl Hito');
    create table split_names as 
    select id as people_id, 
           row_number() over (partition by id order by name_part ) internal_order, 
           name_part
    from 
     ( select id, 
              regexp_split_to_table(full_name, ' ') name_part
       from people) a;
    
    -- people_id | internal_order | name_part
    ------------- ---------------- -----------
    --         1 |              1 | Carl
    --         1 |              2 | Dumas
    --         1 |              3 | Henry
    --         1 |              4 | James
    --         2 |              1 | Al
    --         2 |              2 | Hito
    --         2 |              3 | Karl
    --         2 |              4 | Peter
    
  4. If you plan to work with more complex text search and analysis, you might want to look at tsvectors and trigrams.

  • Related