So, I have a table that typically has around 300-500K rows being inserted in to it within a 24 hour time period. Data is also continuously queried from this table. There is an Intime column that holds the date time that the data is inserted. The data needs to be retained for a period of 3 months as of date. The end-users could also call an API that would update a status flag on all the records for a user within the same time period. TYpically we would just partition the table based on the month or day and set up a rolling partition to handle this.
But given that the data could actually be updated across partitions through a single query, is there any downside to partitioning the table. Any design suggestions on on how to handle such a table is also appreciated.
CREATE TABLE [dbo].[tbl_Message](
[MessageId] [bigint] IDENTITY(1,1) NOT NULL,
[MessageText] [nvarchar](max),
[UserId] [varchar](100),
[Status] [bit] NULL,
[Intime] [datetime] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
CodePudding user response:
Updating rows in a single query that happen to be in different partitions is not a concern, AFAIK. Indexing is the key improving performance and concurrency regardless of whether one uses table partitioning.
Table partitioning can improve manageability of large tables with a sliding window but there are considerations before doing so. A sliding window requires alignment so all indexes must be partitioned and the partitioning column needs to be part of the primary key (presumably MessageID
here) and all other unique indexes.
Also, queries that do not specify the partitioning column will need to touch all partitions. Consequently, a query like SELECT * [dbo].[tbl_Message] WHERE MessageID = 1;
will need 90 index seeks with a 3-month daily sliding a daily to return a single row. Be sure to examine execution plans to avoid performance degradation.