Data Ingestion in Amazon Redshift-CodePudding

I have multiple data source from which I need to build and implement a DWH in AWS. I have one challenge with respect to one of my unstructured data source (Data coming from different APIs). How can I ingest data from this source into the Amazon Redshift??? Can we first pull it into Amazon S3 bucket and then integrate S3 with Amazon redshift? What is a better approach?

CodePudding user response：

Yes, S3 first. You APIs can write to S3 or/and if you like you can use a service like Kinesis (with or without firehose) to populate S3. From there it is just work in Redshift.

CodePudding user response：

Without knowing more about the sources, yes S3 is likely the right approach - whether you require latency in seconds, minutes or hours will be an important consideration.

If latency is not a driving concern, simply:

Set up an S3 bucket to use a destination from your initial source(s).
Create tables in your Redshift database (loading data from S3 to Redshift requires pre-existing destination table).
Use the COPY command load from S3 to Redshift.

As noted, there may be value in Kinesis, especially if you're working with real-time data streams (the service recently introduced support for skipping S3 and streaming directly to Redshift).

S3 is probably the easier approach, if you're not trying to analyze real-time streams.