Home > database >  AWSCLI Commands using Python
AWSCLI Commands using Python

Time:12-04

I want to fetch the ClusterId,ClusterArn,Public DNS of an active EMR Cluster and load them in a Postgres Table.I am able to get the ClusterId & Arn using CLI Commands in the console.

aws emr list-clusters --active --query "Clusters[*].{ClusterId:Id}" --output text
aws emr list-clusters --active --query "Clusters[*].{ClusterArn:ClusterArn}" --output text

After getting the cluster_id , I am able to fetch the DNS using the CLI Command.

cluster_id=j-xxx
aws emr describe-cluster --output text --cluster-id $cluster_id --query Cluster.MasterPublicDnsName

But I have to do this in a Python script. I am not able to integrate this commands in a python script. So for my purpose I did the following - Ran the below command and re-directed the output to a json file.

aws emr list-clusters --active > test.json

Contents of the test.json File -

{
    "Clusters": [
        {
            "Id": "j-xxx",
            "Name": "xxx",
            "Status": {
                "State": "WAITING",
                "StateChangeReason": {
                    "Message": "Cluster ready after last step completed."
                },
                "Timeline": {
                    "CreationDateTime": "2021-12-01T01:08:10.755000-06:00",
                    "ReadyDateTime": "2021-12-01T01:20:13.483000-06:00"
                }
            },
            "NormalizedInstanceHours": 832,
            "ClusterArn": "arn:aws:elasticmapreduce:xxx:xxx:cluster/j-xxx"
        }
    ]
}

Now reading that json file using Python -

import json
import psycopg2

with open("cluster_info.json") as file:
    data=json.load(file)
CId=data["Clusters"][0]["Id"]
CArn=data["Clusters"][0]["ClusterArn"]
print(CId)
print(CArn)
#CDNS=`aws emr describe-cluster --output text --cluster-id $CId --query Cluster.MasterPublicDnsName`
#print(CDNS)
conn = psycopg2.connect(
   database="postgres", user='xxx', password='xxx', host='xxxx.rds.amazonaws.com', port= '5432'
)
cursor = conn.cursor()
query = '''INSERT INTO STAGE.EMR_CLUSTER_INFO (Cluster_ID, Cluster_Arn, Public_DNS) VALUES (%s,%s,%s)'''
values = (CId, CArn, 'ip-xxxx.ec2.internal') #Since I wasnt able to fetch DNS,so hardcoded the value just to test if the record is getting inserted in the tale or not
cursor.execute(query,values)
conn.commit()
print("Records inserted........")
conn.close()

I was able to insert the record in the Table. But I need to fetch the ClusterID,Arn,DNS in the same script and then load the values in the table. Tried using Boto3 ... couldnot succeed ...Please help. Thanks in Advance.

CodePudding user response:

is this what you are looking for ? How do I list all running EMR clusters using Boto? and https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#client

from boto you can fetch detials of MasterPublicDnsName, Id and clusterARN. If you need to know more let me know.

CodePudding user response:

From boto3 describe_cluster():

import boto3

emr_client = boto3.client('emr')

clusters = emr_client.list_clusters()

for cluster in clusters['Clusters']:

  cluster_id = cluster['Id']

  response = emr_client.describe_cluster(ClusterId=cluster_id)

  cluster_arn = response['Cluster']['ClusterArn']
  cluster_dns_name = response['Cluster']['MasterPublicDnsName']

  # Insert into database here
  • Related