Why can't I join to information_schema.columns to get column list for my tables?

I have a table called relevant_tables that resembles:

schema_name     table_name
AAA             AA
AAA             AB
AAA             AC
BBB             BA

I came across the information_schema.columns table which will allow me to get the list of columns for tables if I specify WHERE table_name = 'my_table' AND table_schema = 'my_schema'.

I want to get all the columns for all the tables specifically in my relevant_tables table, as in the below intended output:

schema_name     table_name     column_name
AAA             AA             A
AAA             AA             B
AAA             AA             C
AAA             AB             A
AAA             AC             A
BBB             BA             A

So I tried to join the schema_name and table_name as follows:

    SELECT
        c.table_schema,
        c.table_name,
        column_name
    FROM information_schema.columns c
    JOIN relevant_tables r
    ON c.table_schema = r.schema_name
    AND c.table_name = r.table_name

However, I'm getting the following error:

[0A000] ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.

Why am I getting this error and how do I achieve my intended output?

CodePudding user response：

This error occurs when leader-only data is required in the compute nodes (for your join). There isn't a path for queries to progress this way. There may be a compute node equivalent table or you can read the info from a cursor like - How to join System tables or Information Schema tables with User defined tables in Redshift

Update / more detail:

Let me try to add some clarity around how you might workaround this issue. There are 3 ways I can see:

Find a system table that IS compute node accessible AND has the information you need. As far as I know there is no such system table that has column names. I could be wrong as I haven't search them all. This would be the easiest solution but I don't think a path exists. Let me know if you find one.
Create a user table with the system table information for use in your query. This is what the linked answer above is about. There are 2 ways to do this 1) read the system table info out of Redshift (UNLOAD?) and then load it back into a normal table (COPY) OR 2) read the system table information into a cursor and then read the cursor and put the data into a normal table (the code piece in the linked answer). Neither path is super fast and the normal table with the copied system information will only be as up to date as the last time it was created. Once you have the system table information in a normal table you can use this in any and all queries.
Move database management queries out of Redshift and into some other layer (Lambda?). Redshift is not designed to be an all inclusive operating environment like some other database. It is designed to perform the hardest analytic queries on the largest datasets. When you get to more complicated database management routines (taking the results of this query and applying some db changes) this approach will pay. But this isn't getting your query to run in Redshift, just a bigger picture thought.

I recommend you think about #3 (the long-term picture) but you likely just need to get this to work asap. If someone knows of a system table that has column and table and schema names that IS available to compute nodes a lot of people are interested but I wouldn't spend a lot of time driving down that path because many have before you. This leaves #2 - create a normal table with the data you need. If you understand cursors (at least a little) then the stored procedure is nice as you just need to call the procedure and the normal table version of the system data will be recreated (though not quickly). If this is too much then having a little SQL that UNLOADs the system table info to S3 and then loads (COPY) it back in will work just fine - just need a space in S3 where you can dump the file(s).

Hope this helps