I have an Amazon serverless Aurora SQL database instance with some debt installments data. I was trying to connect on the DB with AWS Lambda (python 3.7) and found this method:
import boto3
rds_client = boto3.client('rds-data')
database_name = 'dbname'
db_cluster_arn = 'arn:aws:rds:us-east-1:xxxx:cluster:xxxx'
db_credentials_secrets_store_arn = 'arn:aws:secretsmanager:us-east-1:xxxx:secret:rds-db-credentials/cluster-xxxx/'
def lambda_handler(event, context):
response = execute_statement('SELECT * FROM focafidc.estoque');
json_string = str(response)
return response
def execute_statement(sql):
response = rds_client.execute_statement(
secretArn=db_credentials_secrets_store_arn,
database=database_name,
resourceArn=db_cluster_arn,
sql=sql
)
return response;
The response returns something like a JSON with data nested in dictionaries:
{
"ResponseMetadata": {
"RequestId": "f7df6de2-8144-4b7b-9cf0-c828454b4a0d",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"x-amzn-requestid": "f7df6de2-8144-4b7b-9cf0-c828454b4a0d",
"content-type": "application/json",
"content-length": "324685",
"date": "Tue, 10 May 2022 13:51:57 GMT"
},
"RetryAttempts": 0
},
"numberOfRecordsUpdated": 0,
"records": [
[
{
"stringValue": "2022-05-02"
},
{
"longValue": 1
},
{
"longValue": 1
},
{
"stringValue": "a3789"
},
{
"stringValue": "519.60"
},
{
"stringValue": "2023-05-02"
},
{
"stringValue": "2598.00"
},
{
"longValue": 666000002
},
{
"stringValue": "1.88"
},
{
"stringValue": "b190"
},
{
"stringValue": "1996-03-25"
},
{
"stringValue": "Brasileiro"
},
{
"stringValue": "masculino"
},
{
"stringValue": "false"
},
{
"stringValue": "SP"
},
{
"stringValue": "São Paulo"
},
{
"longValue": 111111111
},
{
"longValue": 1111111111
},
{
"booleanValue": true
},
{
"stringValue": "LOJAS S.A."
},
{
"stringValue": "99999999999999"
}
],
[
{
"stringValue": "2022-05-02"
},
{
"longValue": 1
},
{
"longValue": 2
},
{
"stringValue": "a3789"
},
{
"stringValue": "519.60"
},
{
"stringValue": "2024-05-01"
},
{
"stringValue": "2598.00"
},
{
"longValue": 666000002
},
{
"stringValue": "1.88"
},
{
"stringValue": "b190"
},
{
"stringValue": "1996-03-25"
},
{
"stringValue": "Brasileiro"
},
{
"stringValue": "masculino"
},
{
"stringValue": "false"
},
{
"stringValue": "SP"
},
{
"stringValue": "São Paulo"
},
{
"longValue": 111111111
},
{
"longValue": 1111111111
},
{
"booleanValue": true
},
{
"stringValue": "LOJAS S.A."
},
{
"stringValue": "99999999999999"
}
],
[
{
"stringValue": "2022-05-02"
},
{
"longValue": 1
},
{
"longValue": 3
},
{
"stringValue": "a3789"
},
{
"stringValue": "519.60"
},
{
"stringValue": "2025-05-01"
},
{
"stringValue": "2598.00"
},
{
"longValue": 666000002
},
{
"stringValue": "1.88"
},
{
"stringValue": "b190"
},
{
"stringValue": "1996-03-25"
},
{
"stringValue": "Brasileiro"
},
{
"stringValue": "masculino"
},
{
"stringValue": "false"
},
{
"stringValue": "SP"
},
{
"stringValue": "São Paulo"
},
{
"longValue": 111111111
},
{
"longValue": 1111111111
},
{
"booleanValue": true
},
{
"stringValue": "LOJAS S.A."
},
{
"stringValue": "99999999999999"
}
],
[
{
"stringValue": "2022-05-02"
},
{
"longValue": 1
},
{
"longValue": 4
},
{
"stringValue": "a3789"
},
{
"stringValue": "519.60"
},
{
"stringValue": "2026-05-01"
},
{
"stringValue": "2598.00"
},
{
"longValue": 666000002
},
{
"stringValue": "1.88"
},
{
"stringValue": "b190"
},
{
"stringValue": "1996-03-25"
},
{
"stringValue": "Brasileiro"
},
{
"stringValue": "masculino"
},
{
"stringValue": "false"
},
{
"stringValue": "SP"
},
{
"stringValue": "São Paulo"
},
{
"longValue": 111111111
},
{
"longValue": 1111111111
},
{
"booleanValue": true
},
{
"stringValue": "LOJAS S.A."
},
{
"stringValue": "99999999999999"
}
],
[
{
"stringValue": "2022-05-02"
},
{
"longValue": 1
},
{
"longValue": 5
},
{
"stringValue": "a3789"
},
{
"stringValue": "519.60"
},
{
"stringValue": "2027-05-01"
},
{
"stringValue": "2598.00"
},
{
"longValue": 666000002
},
{
"stringValue": "1.88"
},
{
"stringValue": "b190"
},
{
"stringValue": "1996-03-25"
},
{
"stringValue": "Brasileiro"
},
{
"stringValue": "masculino"
},
{
"stringValue": "false"
},
{
"stringValue": "SP"
},
{
"stringValue": "São Paulo"
},
{
"longValue": 111111111
},
{
"longValue": 1111111111
},
{
"booleanValue": true
},
{
"stringValue": "LOJAS S.A."
},
{
"stringValue": "99999999999999"
}
]
]
}
I need this data to be a pandas dataframe, so I tried to json_normalize the response JSON and got the following result:
bd1 = pd.json_normalize(response,['records'])
print(bd1)
0 ... 20
0 {'stringValue': '2022-05-02'} ... {'stringValue': '99999999999999'}
1 {'stringValue': '2022-05-02'} ... {'stringValue': '99999999999999'}
2 {'stringValue': '2022-05-02'} ... {'stringValue': '99999999999999'}
3 {'stringValue': '2022-05-02'} ... {'stringValue': '99999999999999'}
4 {'stringValue': '2022-05-02'} ... {'stringValue': '99999999999999'}
Can you guys suggest any method to create or convert this to an only values Dataframe?
CodePudding user response:
How about we first parsing the records to standard python objects, and then we handle the JSON-like python structure to dataframe. Assume that you've already parsed the records to nested-list of dicts like the following:
true, false, null = True, False, None
records = [
[
{
"stringValue": "2022-05-02"
},
{
"longValue": 1
},
{
"longValue": 1
},
{
"stringValue": "a3789"
},
{
"stringValue": "519.60"
},
{
"stringValue": "2023-05-02"
},
{
"stringValue": "2598.00"
},
{
"longValue": 666000002
},
{
"stringValue": "1.88"
},
{
"stringValue": "b190"
},
{
"stringValue": "1996-03-25"
},
{
"stringValue": "Brasileiro"
},
{
"stringValue": "masculino"
},
{
"stringValue": "false"
},
{
"stringValue": "SP"
},
{
"stringValue": "São Paulo"
},
{
"longValue": 111111111
},
{
"longValue": 1111111111
},
{
"booleanValue": true
},
{
"stringValue": "LOJAS S.A."
},
{
"stringValue": "99999999999999"
}
],
[
{
"stringValue": "2022-05-03"
},
{
"longValue": 1
},
{
"longValue": 2
},
{
"stringValue": "a3789"
},
{
"stringValue": "519.60"
},
{
"stringValue": "2024-05-01"
},
{
"stringValue": "2598.00"
},
{
"longValue": 666000002
},
{
"stringValue": "1.88"
},
{
"stringValue": "b190"
},
{
"stringValue": "1996-03-25"
},
{
"stringValue": "Brasileiro"
},
{
"stringValue": "masculino"
},
{
"stringValue": "false"
},
{
"stringValue": "SP"
},
{
"stringValue": "São Paulo"
},
{
"longValue": 111111111
},
{
"longValue": 1111111111
},
{
"booleanValue": true
},
{
"stringValue": "LOJAS S.A."
},
{
"stringValue": "99999999999999"
}
],
]
Here we start to extract the values you focus on:
def first(seq):
return next(iter(seq))
import pandas as pd
records_values = [[first(item.values()) for item in record] for record in records]
df = pd.DataFrame(records_values)
print(df)
The Output is:
0 1 2 3 ... 17 18 19 20
0 2022-05-02 1 1 a3789 ... 1111111111 True LOJAS S.A. 99999999999999
1 2022-05-03 1 2 a3789 ... 1111111111 True LOJAS S.A. 99999999999999
[2 rows x 21 columns]
And if you want to keep the correspond value types, you can extract the value types from one item of the records like this, and do the related type-casting later in pandas:
candidate = records[0]
value_types = [first(item.keys()) for item in candidate]
# ['stringValue', 'longValue', 'longValue', 'stringValue', 'stringValue', 'stringValue', 'stringValue',
# 'longValue', 'stringValue', 'stringValue', 'stringValue', 'stringValue', 'stringValue', 'stringValue',
# 'stringValue', 'stringValue', 'longValue', 'longValue', 'booleanValue', 'stringValue', 'stringValue']