Home > Software design >  CETAS times out for large tables in Synapse Serverless SQL
CETAS times out for large tables in Synapse Serverless SQL

Time:01-03

I'm trying to create a new external table using CETAS (CREATE EXTERNAL TABLE AS SELECT * FROM <table>) statement from an already existing external table in Azure Synapse Serverless SQL Pool. The table I'm selecting from is a very large external table built on around 30 GB of data in parquet format stored in ADLS Gen 2 storage but the query always times out after about 30 minutes. I've tried using premium storage and also tried out most if not all the suggestions made here as well but it didn't help and the query still times out. The error I get in Synapse Studio is :-

Statement ID: {550AF4B4-0F2F-474C-A502-6D29BAC1C558} | Query hash: 0x2FA8C2EFADC713D | Distributed request ID: {CC78C7FD-ED10-4CEF-ABB6-56A3D4212A5E}. Total size of data scanned is 0 megabytes, total size of data moved is 0 megabytes, total size of data written is 0 megabytes. Query timeout expired.

The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.

Is there a way to resolve this timeout issue or a better way to solve the problem?

CodePudding user response:

This is a limitation of Serverless.

Query timeout expired

The error Query timeout expired is returned if the query executed more than 30 minutes on serverless SQL pool. This is a limit of serverless SQL pool that cannot be changed. Try to optimize your query by applying best practices, or try to materialize parts of your queries using CETAS. Check is there a concurrent workload running on the serverless pool because the other queries might take the resources. In that case you might split the workload on multiple workspaces.

Self-help for serverless SQL pool - Query Timeout Expired

The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.

It's simple to do in a Data Factory copy job, a Spark job, or AzCopy.

  • Related