Currently, we use AWS IAM User permanent credentials to transfer customers' data from our company's internal AWS S3 buckets to customers' Google BigQuery tables following BigQuery Data Transfer Service documentation.
Using permanent credentials possesses security risks related to the data stored in AWS S3.
We would like to use AWS IAM Role temporary credentials, which require the support of a session token on the BiqQuery side to get authorized on the AWS side.
Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
We considered Omni framework (https://cloud.google.com/bigquery/docs/omni-aws-cross-cloud-transfer) to transfer data from S3 to BQ, however, we faced several concerns/limitations:
- Omni framework targets data analysis use-case rather than data transfer from external services. This concerns us that the design of Omni framework may have drawbacks in relation to data transfer at high scale
- Omni framework currently supports only AWS-US-EAST-1 region (we require support at least in AWS-US-WEST-2 and AWS-EU-CENTRAL-1 and corresponding Google regions). This is not backward compatible with current customers' setup to transfer data from internal S3 to customers' BQ.
- Our current customers will need to signup for Omni service to properly migrate from the current transfer solution we use
We considered a workaround with exporting data from S3 through staging in GCS (i.e. S3 -> GCS -> BQ), but this will also require a lot of effort from both customers and our company's sides to migrate to the new solution.
CodePudding user response:
Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
No unfortunately.
The official Google BigQuery Data Transfer Service only mentions AWS access keys all throughout the documentation:
The access key ID and secret access key are used to access the Amazon S3 data on your behalf. As a best practice, create a unique access key ID and secret access key specifically for Amazon S3 transfers to give minimal access to the BigQuery Data Transfer Service. For information on managing your access keys, see the AWS general reference documentation.
The irony of the Google documentation is that while it refers to best practices and links to the official AWS docs, it actually doesn't endorse best practices and ignores what AWS mention:
We recommend that you use temporary access keys over long term access keys, as mentioned in the previous section.
Important
Unless there is no other option, we strongly recommend that you don't create long-term access keys for your (root) user. If a malicious user gains access to your (root) user access keys, they can completely take over your account.
You have a few options:
hook into both sides manually (i.e. link up various SDKs and/or APIs)
find an alternative BigQuery-compatible service, which does as such
accept the risk of long-term access keys.
In conclusion, Google is at fault here of not following security best practices and you - as a consumer - will have to bear the risk.