Home > database >  Data Ingestion in Amazon Redshift
Data Ingestion in Amazon Redshift

Time:05-26

I have multiple data source from which I need to build and implement a DWH in AWS. I have one challenge with respect to one of my unstructured data source (Data coming from different APIs). How can I ingest data from this source into the Amazon Redshift??? Can we first pull it into Amazon S3 bucket and then integrate S3 with Amazon redshift? What is a better approach?

CodePudding user response:

Yes, S3 first. You APIs can write to S3 or/and if you like you can use a service like Kinesis (with or without firehose) to populate S3. From there it is just work in Redshift.

CodePudding user response:

Without knowing more about the sources, yes S3 is likely the right approach - whether you require latency in seconds, minutes or hours will be an important consideration.

If latency is not a driving concern, simply:

  1. Set up an S3 bucket to use a destination from your initial source(s).
  2. Create tables in your Redshift database (loading data from S3 to Redshift requires pre-existing destination table).
  3. Use the COPY command load from S3 to Redshift.

As noted, there may be value in Kinesis, especially if you're working with real-time data streams (the service recently introduced support for skipping S3 and streaming directly to Redshift).

S3 is probably the easier approach, if you're not trying to analyze real-time streams.

  • Related