AWS RDS slow connection to public IP over internet-CodePudding

We're experiencing a very slow connection with RDS (Oracle) when connecting to its public IP from internet (outside VPC with a 1Gbps internet connection). The RDS instance is in a public subnet with a public IP enabled. I do a simple select from a large table and I get 300 row/s when connected to the database (via its public IP) from my computer, but from an EC2 inside the same VPC, I get 10000 row/s for the same request. I am located in France and rds db is in Frankfurt datacenter, I'm guessing this is due to a network bottleneck.

What I've tried so far :

EC2 in same VPC connecting to RDS (via private IP), I get ~10000 row/s
EC2 in same region -> RDS (via public IP): ~10000 row/s
local computer -> RDS (via public IP): 300 row/s
local computer -> SSH in EC2 bastion in same VPC as RDS -> RDS (private IP): 300 row/s
Copy DB to Paris datacenter and access it over public IP from my computer: 1000 row/s

I am running a t3.small instance with a gp2 100Go disk but I've tried upgrading to m5.large, with a larger 500Go disk (as this would increase the IOPS) but I'm getting the exact same results, still stuck with network limitations.

Thing is, we have to copy large amounts of data on a regular basis, from an on-prem db to rds db. From the results I get, this is due to a network bottleneck but I wanted to know if anyone had the same use-case and found a workaround to this. I've searched stack overflow (and others) and found similar questions with no answers :(

I would prefer to avoid a costly Direct Connect and I'm not sure an AWS VPN connection would speed that up...

Thanks !

CodePudding user response：

This is the result of bandwidth and latency, caused by the 'distance' of your connection to the database.

AWS has high bandwidth, but there are some network limitations based on the size of the database instance. However, the bigger impact is probably from latency caused by the distance between your computer (on the Internet) and the database. This is caused both by physical distance and the number of 'hops' between systems required to connect the two endpoints.

Further, the protocol being used to 'select' the data is not efficient for sending data. It is probably quite 'chatty', waiting for acknowledgements of data and would likely be uncompressed.

Since your goal is to "copy large amounts of data on a regular basis", I would recommend:

Extract the data 'locally' (if you are extracting from Amazon RDS, then dump the data into S3)
Compress the data (if possible)
Copy the data to the destination (eg to/from S3)
Decompress and load the data into the database 'locally'

This will minimise the 'chatty' nature of a SELECT from a database. However, but it might increase the time from extract to load since it is going through multiple extract/compress steps.

You could also look at using tools that perform incremental extracts of data so that the whole database is not being copied (eg AWS Database Migration Service).