Efficient Way To Design Database For My Specific Use Case-CodePudding

I am building a website where users can view emails that are fetched from my gmail account.

Users can read emails, change their labels & archive them. Each email has metadata associated with it, and users can search through the emails based on the metadata. Furthermore, each user is associated with an organization. Changes made to an email (e.g., if the email is archived, or if the tags are changed) by any one user gets reflected across the organization.

Right now, I store all emails in a single table along with their metadata. However, the problem is that I now have over 20,000 emails in the database, and searching through them based on the metadata takes too much time.

Now one way to optimize this is that when a user runs a search command then the system should only search through emails that are in the inbox & not archived or deleted. But the issue is that where one organization might have archived an email, another organization might have not. So I can not create separate tables for Inbox & Archive. By default emails also get auto-archived after some time (this option can be disabled also), so the Inbox generally has around 4 thousand emails, whereas the archive has many many times that.

My question is does it make sense to create separate Inbox & Archive tables for each organization & just copy all new incoming emails to the tables? Since organizations only join by invitation, so I do not expect the total number to cross 100. Or would this just explode and become too difficult to handle in the code later on, with so many tables.

I am using PostgreSQL for this.

CodePudding user response：

If your operational workflow says "upon adding a new customer create such-and-such a table" then you have a serious database design problem. When you have more than about 50 customers things will slow down due to per-table overhead. In other words, when you start to succeed in business you will start to fail in performance. Not good.

You have a message entity. It, no doubt, contains the message's text, subject, timestamp, from, to, and other attributes that form part of the original message. Each message will have a unique (primary key) message_id. But the entity should not contain attributes like inbox and archive, because those attributes relate to the organization.

You need an org entity. Each organization has a unique org_id, a 'name and other attributes of the organization.

Then you need an org_message table. Its primary key contains both org_id and message_id. And it will contain Boolean attributes like archived and read, and a VARCHAR attribute naming its current folder. So, each org's window into your message table is organized by the org_messages.

If you start with an organization named, for example, shipping, and you want to see all its messages, you use a query like this.

SELECT org.id, org.name,
       message.*,
       COALESCE(org_message.read, 0) unread,
       COALESCE(org_message.archived, 0) archived,
       COALESCE(org_message.folder, 'inbox') folder
  FROM org
  LEFT JOIN org_message ON org.org_id = org_message.org_id
  LEFT JOIN message ON message.message_id = org_message.message_id
 WHERE org.name = 'shipping';

The LEFT JOINs and COALESCEs work to set each org's defaults for each message to unread, not archived, and in the inbox folder. That way you don't have to create a row in org_message for each organization and each message until the org handles the message.

If you want to mark a message as read and archived for a particular org, you INSERT a row into org_message, using ON CONFLICT DO UPDATE

INSERT INTO org_message (org_id, message_id, read, archived, folder)
                 VALUES (?, ?, ?, ?, ?) ON CONFLICT DO UPDATE;

That either sets or updates the org's attributes for the messages

If you find that searching these tables is too slow, you'll need indexes. That's the subject of a different question.