I have a table for blogs in a production site I am developing. The users can post these blogs and filter viewership to targeted audiences based on locations, roles, and groupings of users premade by them. All of these columns will be holding multiple reference ids from other tables.
I found a lot of posts about putting multiple ids in one column in the form of a separate 1:* link table. But for my needs, I have 6 columns that have this occurring. I understand that the logical solution is following the one above would be to make similar link tables for each of the columns, but is that the best course of action? How would this affect the performance or speed of the query to call all necessary data for this one row of data? Any insight is greatly appreciated, and I apologize if there was, in fact, a similar post that I had missed.
Edit:
"tile": "news",
"release_date": "--",
"scheduled_release_date": "2023-11-23",
"target_audience": "Only Staff",
"role_ids": "2",
"sub_role_ids": "4,5,7",
"region_ids": "16,2",
"district_ids": "6,207,55,94",
"campus_ids": "15,62,2699,483",
"plan_ids": "28,84,3",
"group_ids": "94,32",
"creator_ids": "Erika"
An example row of what might be found in the table.
CodePudding user response:
In a normalized database, a multi-valued attribute should be stored as rows in a child table.
The fact that you have several multi-valued attributes in this table doesn't change that design principle. Every multi-valued attribute gets its own child table.
This does unfortunately create an issue if you need to fetch all the values from multiple child tables, because they tend to multiply together. For example, if you have four district_ids
and four campus_ids
, and you join to both tables, you get sixteen rows in the result. There's a Cartesian product between the two child tables. This gets even worse if you join to more child tables that each have multiple matching rows.
One solution is to avoid joining to multiple child tables in a single query where possible. It may be better to do multiple queries, one for each child table, just to avoid the Cartesian products.
Another solution is to GROUP BY
the primary key of the parent table, and recombine the lists using GROUP_CONCAT()
. This feels like returning to using comma-separated lists, but only for query results, not for storage. You still get the benefits of normalization — avoiding redundancy and data anomalies.
Another solution would be to find a way to relate the values from the child tables, so they aren't combined in a Cartesian product. This could be done by window functions like rank or row number if there's no other natural way to relate them.
Some people try to solve this by creating a single child table for all the multi-valued attributes. This breaks Fourth Normal Form. See my answer to https://stackoverflow.com/a/7084586/20860 for explanation of what can go wrong in that solution.
CodePudding user response:
link tables for each of the columns, but is that the best course of action?
Short answer: Usually better to have the 'link tables'.
Tradeoffs:
With a commalist (like you have), it is messy and slow to look into such for a particular value. See FIND_IN_SET()
if you must.
If that is an "opaque" string that MySQL won't look into (for WHERE
, etc), then fine. Think of as a JPG. You never need to pick a row based on some pixels in the image. See GROUP_CONCAT()
for constructing a commalist from multiple rows.