When we run a "COPY INTO from AWS S3 Location" command, does the data-files physically get copied from S3 to EC2-VM-Storage (SSD/Ram)? Or does the data still reside on S3 and get converted to Snowflake format?
And, if I run copy Into and then suspend the warehouse, would I lose data on resumption?
Please let me know if you need any other information.
CodePudding user response:
When we run a "COPY INTO from AWS S3 Location" command, Snowflake copies data file from your S3 location to Snowflake S3 storage. Snowflake S3 location is only accessible by querying the table, in which you have loaded the data.
When you suspend a warehouse, Snowflake immediately shuts down all idle compute resources for the warehouse, but allows any compute resources that are executing statements to continue until the statements complete, at which time the resources are shut down and the status of the warehouse changes to “Suspended”. Compute resources waiting to shut down are considered to be in “quiesce” mode.
More details: https://docs.snowflake.com/en/user-guide/warehouses-tasks.html#suspending-a-warehouse
Details on the loading mechanism you are using are in docs: https://docs.snowflake.com/en/user-guide/data-load-s3.html#bulk-loading-from-amazon-s3
CodePudding user response:
The data is loaded onto Snowflake tables from an external location like S3. The files would still be there on S3 and if there is the requirement to remove these files post copy operation then one can use "PURGE=TRUE" parameter along with "COPY INTO" command.
The files as such will be on the S3 location, the values from it is copied to the tables in Snowflake.
Warehouse operations that are running are not affected even if the WH is shut down and is allowed to complete. So, there is no data loss in the event.