I'm trying to create an Amazon Glue job from Redshift Cluster to dynamoDB. The connection is established but I'm getting the following error:
An error occurred while calling o160.pyWriteDynamicFrame. Unable to execute HTTP request: Connect to dynamodb.us-east-1.amazonaws.com:443 [dynamodb.us-east-1.amazonaws.com/52.119.232.254] failed: connect timed out
There's no problem is Glue connection, and the crawler is working. But I don't know why I'm getting this error. The Availability Zone of Redshift cluster is us-east-1b, so I set the subset as the corresponding subset.
I've followed this link: https://aws.amazon.com/premiumsupport/knowledge-center/connection-timeout-glue-redshift-rds/ and added the connection, but I'm still getting the error.
The Glue script is the following:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# Script generated for node Redshift Cluster
RedshiftCluster_node1 = glueContext.create_dynamic_frame.from_catalog(
database="redshift_bbd",
redshift_tmp_dir=args["TempDir"],
table_name="financial_data",
transformation_ctx="RedshiftCluster_node1",
)
# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
frame=RedshiftCluster_node1,
mappings=[
("units_7d", "int", "units_7d", "int"),
("pcogs_total_13w", "decimal", "pcogs_total_13w", "decimal"),
(
"npp_contra_cogs_13w_total",
"decimal",
"npp_contra_cogs_13w_total",
"decimal",
),
("revenue_7d", "decimal", "revenue_7d", "decimal"),
("asin", "string", "asin", "string"),
("netppm_4w", "decimal", "netppm_4w", "decimal"),
],
transformation_ctx="ApplyMapping_node2",
)
# Script generated for node DynamoDB bucket
Datasink1 = glueContext.write_dynamic_frame_from_options(
frame=ApplyMapping_node2,
connection_type="dynamodb",
connection_options={
"dynamodb.output.tableName": "FINANCIAL_DATA",
"dynamodb.throughput.write.percent": "1.0"
}
)
job.commit()
CodePudding user response:
It turned out that there was no connectivity from my Glue job to DynamoDB, I added VPC endpoints for both S3 and DynamoDB (adding only one of them wasn't enough) and my job worked.
For more information: https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-ddb.html