Home > Net >  Should database include transitive properties to avoid nullability?
Should database include transitive properties to avoid nullability?

Time:10-26

Consider this simple setup.

enter image description here

In this model, the following restrictions apply:

  • A person is either a parent or a child.
  • Only one parent per child
  • A parent has a relationship to a public utility instance

A child has a transitive relationship to the utility because of the parent.

Now, the question is: should every child have the "City utility id" property set in the database?

Advantage: You avoid nullability. It is said that databases will build less effective indexes if the field is nullable because every person that is a child will have the same value (null) on this property.

Disadvantage: Less clean, more bookkeeping on CUD operations. The field does not convey data that isn't already represented in the database.

CodePudding user response:

EF supports both Table Per Hierarchy (TPH) and Table Per Concrete Type (TPT) so as far as the schema goes you have options. While both a Parent and a Child "are a" person, they each have individual characteristics. One of which is that only Parents have a City Utility assigned, the others are that Children can only be associated to a Parent. (A parent cannot reference another parent as a child, or a child cannot reference another child as a parent.) Handling all of these scenarios within a single table is a TPH structure which relies on implicit rules to be enforced by the application code, and results in a lot of null-able references and fields for data that applies to one or the other.

Wherever possible I recommend using a TPT structure and making the relationships more explicit. This has the benefit of not relying solely on application code to ensure that relationships and "optionality vs. required" are enforced at a DB level.

This would have something like:

[Person]
PersonId [PK]
// other common fields that apply to ALL types of Person.

[Parent]
PersonId [PK, FK]
CityUtilityId [FK]
// other parent-specific fields.

[Child]
PersonId [PK, FK]
ParentPersonId [FK] (To Parent, not Person)
// other child-specific fields.

This way if a parent or child has required or optional fields, they can be put in their respective tables with the respective NULL-ability. The alternative is that the field would always be NULL-able and it's up to the application to ensure the required nature for one or the other is enforced. The DB would be free to get into a completely invalid state by mistake at any point without complaint.

There is still a lot of attraction out in the development community to minimize the number of tables which stems largely from the days when drive space was expensive and schema cost $$ so combining similar data into a single table might have made sense. Relationally though it still had significant drawbacks. With modern databases I'd argue it's always better to only combine what is effectively identical and use TPT for inheritance, or use composition.

An example of Composition would be something like an Order which has a status. That status might be Delivered and there might be details we want to record against an order when it is delivered. (signatures, etc.) These could be Null-able fields on the Order table, but they only apply to Delivered orders and would be Null in all other cases. Instead, having a table like OrderDeliveryDetails /w a 1-to-1 relationship with order which is created when an order is delivered. (And deleted/made inactive if an order changes from Delivered to another status for any reason.)

  • Related