I want to retrieve items in a table in dynamodb. then i will add this data to below the last data of the table in big query.
client = boto3.client('dynamodb')
table = dynamodb.Table('table')
response = table.scan(FilterExpression=Attr('created_at').gt(max_date_of_the_table_in_big_query))
#first part
data = response['Items']
#second part
while response.get('LastEvaluatedKey'):
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
df=pd.DataFrame(data)
df=df[['query','created_at','result_count','id','isfuzy']]
# load df to big query
.....
the date filter working true but in while loop session (second part), the code retrieve all items. after first part, i have 100 rows. but after this code
while response.get('LastEvaluatedKey'):
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
i have 500.000 rows. i can use only first part. but i know there is a 1 mb limit, thats why i am using second part. how can i get data in given date range
CodePudding user response:
Your 1st scan API call has a FilterExpression
set, which applies your data filter:
response = table.scan(FilterExpression=Attr('created_at').gt(max_date_of_the_table_in_big_query))
However, the 2nd scan API call doesn't have one set and thus is not filtering your data:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
Apply the FilterExpression
to both calls:
while response.get('LastEvaluatedKey'):
response = table.scan(
ExclusiveStartKey=response['LastEvaluatedKey'],
FilterExpression=Attr('created_at').gt(max_date_of_the_table_in_big_query)
)
data.extend(response['Items'])