How to build data modeling with a child table being used by two or more parent tables-CodePudding

In a simple example where I have a Buyer table and a Suplier table. How to model a Phone table with the two parent tables.

UPDATE Would this be an appropriate approach?

Note: When using a junction table, wouldn't it be difficult, for example, querying a phone number in the database to know if it belongs to a Supplier or a Buyer?

UPDATE

After considering, I believe that the best solution in this case would be to create a base table and call it Entities, so the other entities that have a telephone must have a record in this table. Although the Buyer and Supplier tables in the example have the same fields (just for ease), in the real scenario they are two tables with very different fields.

CodePudding user response：

This is a big question.

For the duration of this discussion, I will keep in frame your follow-up comment, which was basically a question of convenience: given a phone number, it would be nice to be able to easily know who/what it belonged to.

Suppose we kept the direction of the dependency as per your original diagram, where the phone number "pointed to" its buyer or supplier. What does that look like?

create table PhoneNumbers
(
   phoneNumber varchar(16) primary key, 
   buyerId int foreign key references Buyers,
   supplierId int foreign key references suppliers
);

The clear limitation here is that if you add more kinds of "things that have phone numbers", you have to add more and more columns to the phone numbers table. It's also not clear whether a single phone number should be able to be for both a buyer and a supplier. Can only one of the associated Id's be not null? That becomes an annoying check constraint to write if more related things get added.

But, having said that, the solution is fairly trivial. If you absolutely positively know for sure you only ever have two related types, you can "get away with" this. It's bad, but we recognise that it's bad and do it anyway.

But is this convenient? What does the query look like?

select    relatedThingType = case
             when b.buyerId is not null then 'buyer'
             when s.supplierId is not null then 'supplier'
             else '????'
          end     
from      PhoneNumbers p
left join Buyers       b on b.buyerId = p.buyerId
left join Suppliers    s on s.supplierId = p.supplierId
where     p.phoneNumber = @phoneNumber

Not super convenient. We're fighting against the schema here, guarding against situations where both are null, arbitrarily deciding by virtue of the order of our case statement that if both foreign keys are populated, we will choose a buyer over a supplier, and so on.

OK, what if we turn that upside down as suggested, and as per your updated diagram?

create table PhoneNumbers
(
   phoneNumber varchar(16) primary key
);

create table Buyers
(
   buyerId int primary key,
   buyerName varchar(32),
   phoneNumber varchar(16) foreign key references phoneNumbers
);

create table Suppliers
(
   supplierId int primary key,
   supplierName varchar(32),
   phoneNumber varchar(16) foreign key references phoneNumbers
);

One thing is immediately apparent: Given a phone number, we don't actually have to join to the PhoneNumbers table to find out who owns it - as long as we don't use a surrogate Id and instead use the phone number itself as the key.

In fact, tables with a single column (which is therefore necessarily the key) never serve a purpose in queries. Their only function is to enforce referential integrity (and make cascade operations available).

OK, query time

select   relatedThingType = 'buyer'
from     Buyers 
where    phoneNumber = @phoneNumber
union all
select   'supplier'
from     Suppliers
where    phoneNumber = @phoneNumber;

Essentially, join becomes union. Is this more convenient? If there was some metric for "overall size of query structure", it would be about the same, I think. But we aren't fighting the schema quite as much anymore. We also "naturally" allow both a buyer and a supplier to have the phone number, and if that happens, both will be returned.

So there's functional differences between these models in terms of available cardinality, and that also means there's a difference in the query semantics. So far the choice isn't really based on "what's more convenient", it's based on "what is the cardinality your model needs to enforce"?

But... this version is slightly more convenient. Adding a new relatedThingType doesn't require that we change any schema. We just append another union all to the query.

OK, big reveal time. I've been deliberately using this awkward term relatedThingType. We obviously don't have a clear and distinct idea of what this is. How do we fix that?

The thing is, what you're essentially doing is saying "Both buyers and suppliers have a shared aspect: they are both things that can have phone numbers. But they are different in other respects".

If you do any non-sql programming, that smells suspiciously like an inheritance relationship. And that gets hinted at a little more when you look at the schemas of those two tables. They both have names. So they share a common attribute. In OOAD, that would be a field on the base class.

So one way to solve this is to explicitly acknowledge that base class. Both buyers and suppliers are parties, and a party can have a phone number.

create table Parties 
(
   partyType varchar(16) check (partyType in ('buyer', 'supplier')),
   partyId int primary key,
   phoneNumber varchar(16),   
   partyName varchar(32),
   constraint pk_p unique (partyType, partyId)
);

create table Buyers
(
   partyId int primary key,
   partyType varchar(16) default 'buyer' check (partyType = 'buyer'),
   -- other buyer specific columns
   constraint fk_b_p foreign key (partyType, partyId) references Parties(partyType, partyId)
);

create table Suppliers
(
   partyId int primary key,
   partyType varchar(16) default 'supplier' check (partyType = 'supplier'),
   -- other supplier specific columns
   foreign key (partyType, partyId) references Parties(partyType, partyId)
);

Doesn't look very convenient. But wait, that's the schema. What about the query we were talking about?

select  partyType, partyName, phoneNumber
from    Parties
where   phoneNumber = @phoneNumber

That's pretty convenient.

Of course, you could get this same result with a view, the definition of which would be essentially the same as the union all query I wrote earlier. Whether or not you actually want to "instantiate the base class" is just a choice. We'll get to how to make that choice in a second...

But what's with those weird default and check constraints in the schema? What they're doing is enforcing the rule that a party can only be either a supplier, or a buyer, but not both. We can never get a buyer and a supplier accidentally sharing a partyId. This is an "exclusive subtype" relationship. If you need to enforce this, you need to instantiate the base class. A view won't cut it.

We can go even further, because the previous paragraph implies another possible case: the case of an inclusive subtype relationship. A party could be both a buyer, and a supplier. For this, you would move into a "role based modelling" schema, where a Party has one or more Roles, determined by a PartyRoles junction table. I typically call this the "CounterParties" table, because each row represents some party playing some role with respect to our point of view - ie, they're a counterparty of some specific kind, defined by the role.

OK, what's the schema for that? Well... I'm not going to actually do this one. I'll leave it as an exercise.

Well that's all very nice and academic but you haven't given me an answer!

That's true. But the idea here is not to give "the one true answer". Hopefully what all this has shown is that the right answer for you depends on precisely what your model needs to model. It depends on your domain. Can buyers also be suppliers? Can parties have multiple phone numbers? Can one phone number be related to multiple buyers? The answers to these kinds of these questions place strict requirements on what your schema must do, and what it must not do. Any schema you build has to adhere to those constraints. The various options I've provided each adhere to a different set of domain constraints, they're not all equivalent.

So, the answer to your question is... you're going to hate me...

"it depends".

^{Please excuse any syntax errors in the DDL. It's just there to illustrate concepts}