Home > front end >  Amazon Glue - Connection timeout error during job
Amazon Glue - Connection timeout error during job

Time:08-08

I'm trying to create an Amazon Glue job from Redshift Cluster to dynamoDB. The connection is established but I'm getting the following error:

An error occurred while calling o160.pyWriteDynamicFrame. Unable to execute HTTP request: Connect to dynamodb.us-east-1.amazonaws.com:443 [dynamodb.us-east-1.amazonaws.com/52.119.232.254] failed: connect timed out

There's no problem is Glue connection, and the crawler is working. But I don't know why I'm getting this error. The Availability Zone of Redshift cluster is us-east-1b, so I set the subset as the corresponding subset.

I've followed this link: https://aws.amazon.com/premiumsupport/knowledge-center/connection-timeout-glue-redshift-rds/ and added the connection, but I'm still getting the error.

The Glue script is the following:

import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job
    
    args = getResolvedOptions(sys.argv, ["JOB_NAME"])
    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    job.init(args["JOB_NAME"], args)
    
    # Script generated for node Redshift Cluster
    RedshiftCluster_node1 = glueContext.create_dynamic_frame.from_catalog(
        database="redshift_bbd",
        redshift_tmp_dir=args["TempDir"],
        table_name="financial_data",
        transformation_ctx="RedshiftCluster_node1",
    )
    
    # Script generated for node ApplyMapping
    ApplyMapping_node2 = ApplyMapping.apply(
        frame=RedshiftCluster_node1,
        mappings=[
            ("units_7d", "int", "units_7d", "int"),
            ("pcogs_total_13w", "decimal", "pcogs_total_13w", "decimal"),
            (
                "npp_contra_cogs_13w_total",
                "decimal",
                "npp_contra_cogs_13w_total",
                "decimal",
            ),
            ("revenue_7d", "decimal", "revenue_7d", "decimal"),
            ("asin", "string", "asin", "string"),
            ("netppm_4w", "decimal", "netppm_4w", "decimal"),
        ],
        transformation_ctx="ApplyMapping_node2",
    )
    
    # Script generated for node DynamoDB bucket
    Datasink1 = glueContext.write_dynamic_frame_from_options(
        frame=ApplyMapping_node2,
        connection_type="dynamodb",
        connection_options={
            "dynamodb.output.tableName": "FINANCIAL_DATA",
            "dynamodb.throughput.write.percent": "1.0"
        }
    )
    
    job.commit()

CodePudding user response:

It turned out that there was no connectivity from my Glue job to DynamoDB, I added VPC endpoints for both S3 and DynamoDB (adding only one of them wasn't enough) and my job worked.

For more information: https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-ddb.html

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/vpc-endpoints-dynamodb-tutorial.html

  • Related