Home > Back-end >  SQL database with a column being a list or a set
SQL database with a column being a list or a set

Time:10-22

With a SQL database (in my case Sqlite, using Python), what is a standard way to have a column which is a set of elements?

id  name       items_set
1   Foo        apples,oranges,tomatoes,ananas
2   Bar        tomatoes,bananas
...

A simple implementation is using

CREATE TABLE data(id int, name text, items_set text);

but there are a few drawbacks:

  • to query all rows that have ananas, we have to use items_set LIKE '%ananas%' and some tricks with separators to avoid querying "ananas" to also return rows with "bananas", etc.

  • when we insert a new item in one row, we have to load the whole items_set, and see if the item is already in the list or not, before concatenating ,newitem at the end.

  • etc.

There is surely better, what is a standard SQL solution for a column which is a list or set?

Note: I don't know in advance all the possible values for the set/list.

I can see a solution with a few additional tables, but in my tests, it multiplies the size on disk by a factor x2 or x3, which is a problem with many gigabytes of data.

Is there a better solution?

CodePudding user response:

To have a well structured SQL database, you should extract the items to their own table and use a join table between the main table and the items table

I'm not familiar with the Sqlite syntax but you should be able to create the tables with

CREATE TABLE entities(id int, name text);  
CREATE TABLE entity_items(entity_id int, item_id int);  
CREATE TABLE items(id int, name text);

add data

INSERT INTO entities (name) VALUES ('Foo'), ('Bar');  
INSERT INTO items (name) VALUES ('tomatoes'), ('ananas'), ('bananas');  
INSERT INTO entity_items (entity_id, item_id) VALUES (
  (SELECT id from entities WHERE name='Foo'),
  (SELECT id from items WHERE name='bananas')
);

query data

SELECT * FROM entities
LEFT JOIN entity_items
ON entities.id = entity_items.entity_id
LEFT JOIN items
ON items.id = entity_items.item_id
WHERE items.name = 'bananas';

CodePudding user response:

In Mysql, You have set Type

Creation:

CREATE TABLE myset (col SET('a', 'b', 'c', 'd'));

Select:

mysql> SELECT * FROM tbl_name WHERE FIND_IN_SET('value',set_col)>0;
mysql> SELECT * FROM tbl_name WHERE set_col LIKE '%value%';

Insertion:

INSERT INTO myset (col) VALUES ('a,d'), ('d,a'), ('a,d,a'), ('a,d,d'), ('d,a,d');

CodePudding user response:

You have probably two options. One standard approach, which is more conventional, is many-to-many relationship. Like you have three tables, for example, Employees, Projects, and ProjectEmployees. The latter describes your many-to-many relationship (each employee can work on multiple projects, each project has a team).

Having a set in a single value denormalized the table and it will complicate the things either way. But if you just, use the JSON format and the JSON functionality provided by SQLite. If your SQLite version is not recent, it may not have the JSON extension built in. You would need either updated (best option) or load the JSON extension dynamically. Not sure if you can do it using the SQLite copy supplied with Python.

CodePudding user response:

To elaborate on what @ussu said, ideally your table would have one row per thing & item pair, using IDs instead of names:

id  thing_id   item_id
1   1          1
2   1          2
3   1          3
4   1          4
5   2          3
5   2          4

Then look-up tables for the thing and item names:

id  name
1   Foo
2   Bar

id  name
1   apples
2   oranges
3   tomatoes
4   bananas
  • Related