Currently I have a table users, a table chats, however I want there to be "Group chats" and "Private chats (dm)". A group chat needs more data than a private chat, such as for example: Group name, picture, ....
What is the best way to approach this? Do I make 1 table chats, and put a type attribute in there that deteremines if it is private or not and leave some columns blank if it is a private chat. OR Would I make 2 tables one for private chats, and one for group chats?
CodePudding user response:
This is a similar scenario to the general question "should you split sensitive columns into a new table" and the general answer is the same, it is going to depend largely on your data access code and your security framework.
What about a third option, why not just model a Private Chat
as a Group Chat
that only has 2 members in the group?. Sometimes splitting the model into these types is a premature optimisation, especially in the context of a chat style application. For instance, couldn't a private chat benefit from having an image in the same way that a group chat does? Could there not be some benefit to users being able to specify a group name to their own private group?
You will find the whole development and management of your application a lot simpler if there is just one type of chat and it is up to the user to decide how many people can join or indeed if other people can join the chat.
If you still want to explore the 2 conceptual types this is this is an answer that might give you some indirect insights: https://stackoverflow.com/a/74398184/1690217 but ultimately we need additional information to justify selecting one structure over the other. Performance, Security and general data governance are some considerations that have implications or impose caveats on implementation.
From a structural point of view, your Group Chats
and Private Chats
can be both implementations of a common Chat
table, conceptually we could say that both forms inherit from Chat
.
In relational databases we have 3 general options to model inheritance:
- Table Per Hierarchy (TPH)
Use a single table with a discriminator column that determines for each row what the specific type is. Then in your application layer or via views you can query the specific fields that each type and scenario needs.- In TPH the base type is usually an abstract type definition
- Table Per Type (TPT)
The base type and each concrete type exists as their own separate tables. The FK from the inheriting tables is the PK and shares the same PK value as the corresponding record in the base table, creating a 1:0-1 relationship. This requires some slightly more complicated data access logic but it makes it harder to accidentally retrieve aPrivate Chat
in aGroup Chat
context because the data needs to be queried explicitly from the correct table.- In TPT the base type is itself a concrete type and data records do not have to inherit into the extended types at all.
- Simple Isolated Tables (No inheritance in the schema)
This is often the simplest approach, if your tables do have inheritance in the application logic then the common properties would be replicated in each table. This can result in a lot of redundant data access logic, but the OO inheritance in the application layer following DRY principal solves most of code redundancy issues.
This answer to How can you represent inheritance in a database? covers DB inheritance from a more academic and researched point of view.
From a performance point of view, there are benefits to isolating workloads if the usage pattern is significantly different. So if Group Chats
have a different usage profile, either the frequency or type of queries is significantly different, or the additional fields in Group Chat
would benefit from their own index profiles, then splitting the tables will allow your database engine to provide better index management and execution plan optimisations due to more accurate capture of table statistics.
From a security and compliance point of view, a single table implementation (TPH) can reduce the data access logic and therefore the overall attack surface of the code. But a good ORM or code generation strategy usually mitigates any issues that might be raised in this space. Conversely TPH or simple tables make it easier to define database or schema level security policies and constraints.
Ultimately, which solution is best for you will come down to the effort required to implement and maintain the application logic for your choice.
I will sometimes use a mix of TPT and TPH in the same database but often lean towards TPT if I need inheritance within the data schema, this old post explains my reasoning against TPH: Database Design: Discrimator vs Separate Tables with regard to Constraints. My general rule is that if the type needs to be polymorphic, either to be considered of both types or for the type context to somehow change dynamically in the application runtime, then TPT or no inheritance is simpler to implement. I use TPH when the differences between the types is minimal and not expected to reasonably diverge too much over the application life time, but also when the usage and implementations are going to be very similar.
TPT provides a way to express inheritance but also maintain a branch of vastly different behaviours or interactions (on top of the base implementation). many TPT implementations look as if they might as well have been separate tables, the desire to constrain the 1:1 link between the records is often a strong decider when choosing this architectural pattern. A good way to think about this model, even if you do not use inheritance at the application logic level, is that you can extend the base record to include the metadata and behaviours of any of the inheriting types. In fact with TPT it is hard to constrain the data records such that you cannot extend into multiple types.
- Due to this limitation, TPT can often be modelled from the application layer as not using OO Inheritance at all
- TPT complements Composition over Inheritance
TPH is often the default way to model a Domain Model that implements simple inheritance, but this introduces a problem in application logic if you need to change the type or is incompatible with the idea that a single record could be both types. There are simple workarounds for this, but historically this causes issues from a code maintenance point of view, it's a clash of concepts really, TPH aligns with Inheritance more than Composition
In the context of Chat, TPT can work from a Composition point of view. All chats have the same basic features and interactions, but Group Chat
records can have extended metadata and behaviours. Unless you envision Private Chat
having a lot of its own specific implementation there is not really a reason to extend the base concept of Chat
to a Private Chat
implementation if there is no difference in that implementation.
For that reason too though, is there a need to differentiate between Private and Group chats at all from a database perspective? Your application runtime shouldn't be using blind SELECT *
style queries to access the data in either case, it should be requesting the specific fields that it needs for the given context, whether you use a Field in the table, or the Name of the table to discrimate between the different concepts is less important than being able to justify the existence of or the difference between those concepts.