running r script in AWS-CodePudding

Looking at this page and this piece of code in particular:

import boto3

account_id = boto3.client("sts").get_caller_identity().get("Account")
region = boto3.session.Session().region_name

ecr_repository = "r-in-sagemaker-processing"
tag = ":latest"

uri_suffix = "amazonaws.com"
processing_repository_uri = "{}.dkr.ecr.{}.{}/{}".format(
    account_id, region, uri_suffix, ecr_repository   tag
)

# Create ECR repository and push Docker image
!docker build -t $ecr_repository docker
!$(aws ecr get-login --region $region --registry-ids $account_id --no-include-email)
!aws ecr create-repository --repository-name $ecr_repository
!docker tag {ecr_repository   tag} $processing_repository_uri
!docker push $processing_repository_uri

This is not pure Python obviously? Are these AWS CLI commands? I have used docker previously but I find this example very confusing. Is anyone aware of an end-2-end example of simply running some R job in AWS using sage maker/docker? Thanks.

CodePudding user response：

This is Python code mixed with shell script magic calls (the !commands).

Magic commands aren't unique to this platform, you can use them in Jupyter, but this particular code is meant to be run on their platform. In what seems like a fairly convoluted way of running R scripts as processing jobs.

However, the only thing you really need to focus on is the R script, and the final two cell blocks. The instruction at the top (don't change this line) creates a file (preprocessing.R) which gets executed later, and then you can see the results.

Just run all the code cells in that order, with your own custom R code in the first cell. Note the line plot_key = "census_plot.png" in the last cell. This refers to the image being created in the R code. As for other output types (eg text) you'll have to look up the necessary Python package (PIL is an image manipulation package) and adapt accordingly.

Try this to get the CSV file that the R script is also generating (this code is not validated, so you might need to fix any problems that arise):

import csv

csv_key = "plot_data.csv"
csv_in_s3 = "{}/{}".format(preprocessed_csv_data, csv_key)
!aws s3 cp {csv_in_s3} .

file = open(csv_key)
dat = csv.reader(file)

display(dat)

So now you should have an idea of how two different output types the R script example generates are being handled, and from there you can try and adapt your own R code based on what it outputs.