Home > OS >  Redshift - Choosing DistKey
Redshift - Choosing DistKey

Time:08-13

Is it possible to use 2 dist keys in a table?

I have a query which makes use of 3 tables.

Table 1 - Table 2 join on ID1 and Table 2 - Table 3 join on ID2.

I need to display the most recent data in this case.

So, according to my analysis sortkey can be updated_time since I'd be limiting data on the basis of this field.

Further, dist key should the attribute which would be most used for joins.

That means, for Table 1 distkey can be ID1, for Table 3 distkey can be ID2.

Can there be 2 distkeys for table 2 - ID1 and ID2?

CodePudding user response:

No, it is not possible to specify multiple DISTKEYs.

From CREATE TABLE - Amazon Redshift:

DISTKEY
Keyword that specifies that the column is the distribution key for the table. Only one column in a table can be the distribution key.

The distribution key is used to distribute rows amongst servers. Preferably, when JOINing tables, they should have the same DISTKEY so that data is co-located on the same server and does not need to be sent between servers.

You will need to optimize for your most important table. It is not always possible to optimize for all tables and queries.

It is also possible to specify an AUTO distribution style. From Distribution styles - Amazon Redshift:

With AUTO distribution, Amazon Redshift assigns an optimal distribution style based on the size of the table data. For example, Amazon Redshift initially assigns ALL distribution to a small table, then changes to EVEN distribution when the table grows larger. When a table is changed from ALL to EVEN distribution, storage utilization might change slightly. The change in distribution occurs in the background, in a few seconds.

When you set DISTSTYLE to AUTO, Amazon Redshift might change the distribution of your table data to have a KEY-based distribution style.

I recommend studying Tuning query performance - Amazon Redshift and watching some AWS Reinvent videos about Redshift to discover how to best optimize tables and queries.

CodePudding user response:

Adding to John R's correct answer.

You should consider making either table 2 or table 3 DISTSTYLE ALL. You don't mention sizes or other use patterns for these tables so this isn't a slam dunk. It looks like these tables are likely dimensional in nature and the space increase for DISTSTYLE ALL is often minor or even a net win when dealing with smaller tables.

  • Related