Hi I am trying to move a file cross account, from bucket accountA to bucket accountB, I am getting following error
An error occurred while calling o88.parquet. dt/output1/parquet/_temporary/0/: PUT 0-byte object on dt/output1/parquet/_temporary/0/: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: F99P5W0C8Q28BJ4R; S3 Extended Request ID: VpFGWR9JR7r2yae9v8ezB7HAgJu0uuwn4v3mBAG8CaaJ2q0 sOVFGdxsZ1GzMXhAifSCtdxJ0OM=; Proxy: null), S3 Extended Request ID: VpFGWR9JR7r2yae9v8ezB7HAgJu0uuwn4v3mBAG8CaaJ2q0 sOVFGdxsZ1GzMXhAifSCtdxJ0OM=:AccessDenied
I have following setup at my end.
Account A has following role cross-accountA-sample-role with following policy
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:ListAllMyBuckets"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::*"
]
},
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::my-bucket"
},
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:Put*",
"s3:List*"
],
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}
Trust Relationship in Account A role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "s3.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{accountBId}:role/{accountBrole}"
},
"Action": "sts:AssumeRole"
}
]
}
Account B cross account role
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::{accountAId}:role/{accountArole}"
}
]
}
EDIT Account B Policies attached to role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*",
"s3-object-lambda:*"
],
"Resource": "*"
}
]
}
and
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:*",
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListAllMyBuckets",
"s3:GetBucketAcl",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeRouteTables",
"ec2:CreateNetworkInterface",
"ec2:DeleteNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcAttribute",
"iam:ListRolePolicies",
"iam:GetRole",
"iam:GetRolePolicy",
"cloudwatch:PutMetricData"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket"
],
"Resource": [
"arn:aws:s3:::aws-glue-*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::aws-glue-*/*",
"arn:aws:s3:::*/*aws-glue-*/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::crawler-public*",
"arn:aws:s3:::aws-glue-*"
]
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:*:*:/aws-glue/*"
]
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags",
"ec2:DeleteTags"
],
"Condition": {
"ForAllValues:StringEquals": {
"aws:TagKeys": [
"aws-glue-service-resource"
]
}
},
"Resource": [
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:instance/*"
]
}
]
}
pretty much redundant access but at this point I am not concerned about that anymore.
CodePudding user response:
So here is my finding, maybe not an ideal solution, but this is what worked for me.
The catch was something in my opinion not well explained by AWS, if there is such an explanation I am unware of that. What worked for me was to create a policy which assumes role from account A, Now if you see my policy below I am assuming role from account A. My understanding was that once we assume the role we dont need to do anything much, there will be an API call(internal) that will access bucket for me coz role attached to my Glue Job is assuming Role from account A. Guess what, this assumption didn't work for me.
What I actually had to do more was to make STS assume call from within my code as well, which grants me temporary credentials, with temporary credentials I had to update underlying Hadoop configurations. NOTE as of now IF the Glue Job Role (in account B) dont have STS assume Role capability the glue job will fail. Thanks to this article STS assume Role API, I was able to do Cross account S3 access. I hope this saves someone 1s time.
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3
sts_connection = boto3.client('sts')
response = sts_connection.assume_role(RoleArn='arn:aws:iam::account_id_here:role/my_role_assumed_from_accountA', RoleSessionName='GlueTenantASession',DurationSeconds=3600)
credentials = response['Credentials']
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
sc._jsc.hadoopConfiguration().set('fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider')
sc._jsc.hadoopConfiguration().set('fs.s3a.access.key', credentials['AccessKeyId'])
sc._jsc.hadoopConfiguration().set('fs.s3a.secret.key', credentials['SecretAccessKey'])
sc._jsc.hadoopConfiguration().set('fs.s3a.session.token', credentials['SessionToken'])
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# Script generated for node S3 bucket
data_frame = glueContext.create_dynamic_frame.from_options(
format_options={"multiline": False},
connection_type="s3",
format="parquet",
connection_options={"paths": ["s3a://path_to_bucket_in_other_account"]},
transformation_ctx="S3bucket_node1",
)
data_frame.show()
My Inline policy to assume role from account A
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource":"arn:aws:iam::account_id:role/my_role_assumed_from_accountA"
}
]
}