I am trying to create a SQL table to store a customer id and zipcode, only these 2 columns. Combination of these 2 values makes a row unique. I have 3 options in mind but not sure which one would be efficient. I will store around 200000 rows in this table and the read operation is high and write will happen once in a day.
Select query will get all customers based on the input zipcode.
example:
Select customerid from dbo.customerzipcode where zipcode in (<multiple zipcodes>)
Option 1:
- Create a table with 2 columns (customerid and zipcode)
- Create a composite primary key for these 2 columns.
Option 2:
- Create a table with 3 columns (id, customerid and zipcode)
- id being identity and primary key
- create a unique constraint for customerid and zipcode
Option 3:
- Create a table with 3 columns (id, customerid and zipcode)
- Create a non clustered index for zipcode alone.
Can you please share which option would be better?
CodePudding user response:
The best option for your use case will depend on the specific requirements and constraints of your application.
Option 1: Creating a composite primary key on the customerid and zipcode columns is a good approach if the combination of these two values uniquely identifies each row in the table. This will ensure that no duplicate rows can be inserted into the table and will also allow for fast lookups using the primary key.
Option 2: Creating a separate identity column (auto-incrementing primary key) and a unique constraint on the customerid and zipcode columns is a good approach if you want to maintain a separate primary key for each row and also ensure that the combination of these two values is unique. This option is good if you want to use the primary key as foreign key in other tables.
Option 3: Creating a non-clustered index on the zipcode column alone is a good approach if you want to optimize the performance of queries that filter on the zipcode column. This option is good if you want to get the customers based on the zipcode.
In your case, since you have high read operations, and you will be querying customers based on their zipcode, option 3 would be the best, as it will optimize the performance of your queries. However, it is always good to consider the trade-off between read and write performance, and adjust your design accordingly.
It is also worth noting that it's recommended to test your queries and the performance of your table with sample data to ensure that the design you choose will meet your performance needs.
CodePudding user response:
Select customerid from dbo.customerzipcode where zipcode in ()
The canonical design would have an index with each column as the leading column to support efficient lookup by zipcode or by customerid, eg
create table customerzipcode
(
zipcode varchar(10) not null,
customerid int not null references customer,
constraint pk_customerzipcode primary key (zipcode,customerid),
index ix_customerzip_customerid (customerid)
)