What is the difference between Redshift External Tables and Redshift Spectrum-CodePudding

After reading through the documentation I'm finding it difficult to understand how these two things fit together or if they are the same thing?

Is Spectrum the underlying technology name and external tables sit on top of it? Can you use External Tables without using Redshift Spectrum? If they are different, what is the difference?

CodePudding user response：

To work with any data in Redshift (RS), you need to define the schema of the data. This is done through tables, just like in tradiotional databases, such as MySQL.

RS Spectrum (RSS) allows you to interact directly with data in S3 (no need to COPY it to RS). But to interact with that data you need to define its schema, as RS can't work with un-defined data. This is done through External Tables (ET). So ET are same as regular RS tables with the exception that data is stored in S3, not in RS nodes.

To answer your questions:

Can you use External Tables without using Redshift Spectrum

No, you can't. ET are only used for RSS to query data in S3 and no other external data source. For querying data from other sources than S3 (e.g. MySQL, PostgreSQL), there are federated queries.

But the opposite is true. You can use RSS without creating ET manually if you have tables defined in AWS Glue Data Catalog.

If they are different, what is the difference?

ET describes a schema (column names and their data types) of your data, as well as its location in S3. RSS is computational engine that AWS uses in the backend to actually execute your quires on S3 data.