Home > Blockchain >  Is there a way to get raw files' content from GitHub using the Amazon AppFlow integration?
Is there a way to get raw files' content from GitHub using the Amazon AppFlow integration?

Time:12-17

And if so, which object and subobject would that be? The file content of my CSV file does not show up when I use Repository (repos) as the object and ystoneman (my GitHub username) as the subjobject. Instead, the columns all contain only metadata.

The GitHub REST API itself seems to support this via the Repository Contents API. For example, I'm able to get the contents of an 18 MB file with the following cURL command:

curl \
  -H "Accept: application/vnd.github.raw json" \
  -H "Authorization: Bearer TOKEN"\
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/repos/ystoneman/hotel-bookings/contents/hotel_bookings.csv

And here's an example of the output (data from Kaggle):

City Hotel,0,34,2017,August,35,31,2,5,2,0,0,BB,DEU,Online TA,TA/TO,0,0,0,D,D,0,No Deposit,9,NULL,0,Transient,157.71,0,4,Check-Out,2017-09-07
City Hotel,0,109,2017,August,35,31,2,5,2,0,0,BB,GBR,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,89,NULL,0,Transient,104.4,0,0,Check-Out,2017-09-07
City Hotel,0,205,2017,August,35,29,2,7,2,0,0,HB,DEU,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,9,NULL,0,Transient,151.2,0,2,Check-Out,2017-09-07

Seems like using a source object of Repository, Branch, and Commit does not yield this data, even when I use an auth token with all read actions allowed on the repository, destination as S3, and I choose "Map all fields directly".

CodePudding user response:

Got the answer from the Amazon AppFlow service team. Currently only the Amazon S3 source on AppFlow supports unstructured data, so no, getting a CSV file's contents from GitHub via AppFlow would not work at this time, since GitHub's API does not perceive a CSV file within a repo as structured data but as a raw blob.

  • Related