Home > Blockchain >  Architecture for a large number of entries with many columns searchable by range
Architecture for a large number of entries with many columns searchable by range

Time:11-07

The database consists of one main table with one million entries each with approximately 100 fields.

  1. Most fields are numeric and searching will be by ranges, not exact values.
  2. It will be possible to search for multiple fields simultaneously (only AND, not OR)
  3. Each entry has data for each column
  4. Values calculated on the basis of others and also searchable
  5. Easily add/remove columns

Which architecture will be the most appropriate?

  • Is it better to use SQL or NOSQL? Maybe elastic search?
  • If SQL then should the data be stored "traditional" way (big number of columns)

CodePudding user response:

A table structure like that is harder to optimise in SQL and will involve many different indexes on the same table that will result in a large storage footprint.

Easily add/remove columns

This requirement usually points us in the direction of NOSQL but in SQL this can be achieved by modifying the query or view, If you are adding or removing columns then that indicates that the data is either referenced from other existing tables or there will be some form of CRUD or data manipulation interface. SQL vs NOSQL in this instance comes down to the type of data updates you want to perform and if they are per-document or if you need bulk update support. In most realities, the effort required to achieve this in SQL vs NOSQL should be the same and is only limited by your knowledge on how to achieve the task in your chosen architecture.

with approximately 100 fields.

This indicates that the data is not in a normalized form, which means many of the benefits that we get from SQL over NOSQL are not being utilized.

It is not that SQL is bad for this type data, its just not a compelling argument to setup a whole SQL database for a single table like this compared to using NOSQL or a dedicated search API like elastic search.

If you already have a database schema that is managing other data aspects of this business data and are not concerned by the performance or the cost and effort to achieve optimal performance, then it can work well in SQL.

The best search or read performance based approach would be to use a dedicated search index provider. Elastic search is a great option, I use Azure Cognitive Search which is similar, the data is still stored in the database, though usually in a normalized form, then a projection of the data or a view is uploaded to the search index provider. Application search queries are performed against the index, from the results in the index the application can then re-query from the database if required or after the user has selected a specific record from the results.

If using SQL then there isn't really a simple rule that states if you should normalise the dataset or not. Normalising helps mostly with managing and updating data, it will also help by having a good set of default indexes on the data, so that a view with over 100 fields can be efficiently queried. Without normalizing you will need to tune the database with indexes to support your search arguments.

  • Related