Home > Software design >  Why does the number of returned samples where name='keyword' does not match the number of
Why does the number of returned samples where name='keyword' does not match the number of

Time:09-10

I have a Postgres table whose header is [id(uuid), name(str), arg_name(str), measurements(list), run_id(uuid), parent_id(uuid)] with a total of 237K entries.

When I want to filter for specific measurements I can use 'name', but for the majority of entries in the table 'name' == 'arg_name' and thus map to the same sample.

In my peculiar case I am interested in retrieving samples whose 'name'='TimeM12nS' and whose 'arg_name'='Time'. These two attributes point to the same samples when visually inspecting the table through PgAdmin. That is to say all entries which have arg_name='Time' also have the name='TimeM12nS' and vice-versa.

Its obvious there's a problem because of the quantity of returned samples is not the same. I first noticed the problem using django orm, but the problem is also present when I query the DB using PgAdmin.

SELECT *
FROM TableA
WHERE name='TimeM12nS'

returns 301 entries (name='TimeM12nS' and arg_name='Time' in all cases)

BUT the query:

SELECT *
FROM TableA
WHERE arg_name='Time'

returns 3945 (name='TimeM12nS' and arg_name='Time' in all cases)

I am completely stumped, anyone think they can shed some light into what's happening here?

EDIT: I should add that the query by 'arg_name' returns the 301 entries that are returned when querying by 'name'

CodePudding user response:

First let me say thank you to everyone who pitched in ideas to solve this conundrum and especially to JGH for the solution (found in the comments of the original post).

Indeed the problem was a indexing issue. After re-indexing the queries return the same number of entries '3945' as expected.

In Postgress re-indexing a table can be achieved through pgAdmin by navigating to Databases > 'database_name' > Schemas > Tables then right-clicking on the table_name selecting Maintenance and pressing the REINDEX button.

or more simply by running the following command

REINDEX TABLE table_name

Postgress Re-Indexing Docs

CodePudding user response:

Without access to the database, it's not possibly to give a definitive answer. All I can provide is the next query that I would use in this case.

SELECT COUNT(*), LENGTH(name), name, arg_name
FROM TableA
WHERE arg_name='Time'
GROUP BY name, arg_name;

This should show you any differences in the name column that you aren't able to see. The length of that string could also be informative.

  • Related