We have been developing an enterprise application for the last two years. Based on microservice architecture, we have nine services with their respective databases and an Angular frontend on NGINX that calls/connects microservices. During our development, we implemented these services and their databases on the Hetzner cloud server with 4GB RAM and 2 CPUs over the internal network, and everything has been working seamlessly. We are uploading all images, pdf, and videos on AWS S3, and it has been smooth sailing. Videos of all sizes were uploaded and played without any issues.
We liked Hetzner and decided to go production also with them. We took the first server and installed proxmox over it, and deployed LXC containers and our services. I tested again here, and no problems were found again.
We then decided to take another server, deployed proxmox, and clustered them. This is where the problem started when we hired a network guy who configured a bridged network between the containers of both nodes. Each container pings the other well, and the telnet also connects over an internal network. MTU set on this bridge is 1400.
Primary Problem- We are NOT able to upload videos over 2 MB to S3 anymore from this network
Other problems – These are intermittent issues, noted in logs–
NGNIX – 504 Gateway Time-out ERRORS of likes, on multiple services--> upstream timed out (110: Connection timed out) while reading response header from upstream, client: 223.235.101.169, server: abc.xyz.com, request: "GET /courses/course HTTP/1.1", upstream: "http://10.10.XX.XX:8080//courses/course/toBeApprove", host: " abc.xyz.com, ", referrer: "https:// abc.xyz.com, /"
Tomcat- com.amazonaws.services.s3.model.AmazonS3Exception: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 7J2EHKVDWQP3367G; S3 Extended Request ID: xGGCQhESxh/Mo6ddwtGYShLIeCJYbgCRT8oGleQu/IfguEfbZpTQXG/AIzgLnG2F5YuCqk7vVE8=), S3 Extended Request ID: xGGCQhESxh/Mo6ddwtGYShLIeCJYbgCRT8oGleQu/IfguEfbZpTQXG/AIzgLnG2F5YuCqk7vVE8=
(we increased all known timeouts, both in nginx and tomcat)
- Mysql- 2022-09-08T04:24:27.235964Z 8 [Warning] [MY-010055] [Server] IP address '10.10.XX.XX could not be resolved: Name or service not known
Other key points to note – we allow video up to 100 mb to upload thus known limits set in nginx and tomcat configurations
Nginx, client_max_body_size 100m;
And tomcat <Connector port="8080" protocol="HTTP/1.1" maxPostSize="102400” maxHttpHeaderSize="102400" connectionTimeout="20000" redirectPort="8443" />
In these readings and trials running over last 15 days, we stopped, all firewalls, ufw on OS, proxmox firewall, and even the data center firewall while debugging.
This is our nginx.conf
http {
proxy_http_version 1.1;
proxy_set_header Connection "";
##
client_body_buffer_size 16K;
client_header_buffer_size 1k;
client_max_body_size 100m;
client_header_timeout 100s;
client_body_timeout 100s;
large_client_header_buffers 4 16k;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 300;
send_timeout 600;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
gzip on;
gzip_comp_level 2;
gzip_min_length 1000;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain application/x-javascript text/xml text/css application/xml;
These are our primary test/debugging trials.
**1. Testing with a small video (of size 273 Kb)**
a. Nginx log- clean, nothing related to operations
b. Tomcat log-
Start- CoursesServiceImpl - addCourse - Used Memory:73
add course 703
image file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile@15476ca3
image save to s3 bucket
image folder name images
buckets3 lmsdev-cloudfront/images
image s3 bucket for call
imageUrl https://lmsdev-cloudfront.s3.amazonaws.com/images/703_4_istockphoto-1097843576-612x612.jpg
video file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile@13419d27
video save to s3 bucket
video folder name videos
input Stream java.io.ByteArrayInputStream@4da82ff
buckets3 lmsdev-cloudfront/videos
video s3 bucket for call
video url https://lmsdev-cloudfront.s3.amazonaws.com/videos/703_4_giphy360p.mp4
Before Finally - CoursesServiceImpl - addCourse - Used Memory:126
After Finally- CoursesServiceImpl - addCourse - Used Memory:49
c. S3 bucket
[S3 bucket][1]
[1]: https://i.stack.imgur.com/T7daW.png
3. Testing with video 2 mb (fractionally less)
a. Progress bar keeps running about 5 minutes, then
b. Nginx logs-
2022/09/10 16:15:34 [error] 3698306#3698306: *24091 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 223.235.101.169, server: login.pathnam.education, request: "POST /courses/courses/course HTTP/1.1", upstream: "http://10.10.10.10:8080//courses/course", host: "login.pathnam.education", referrer: "https://login.pathnam.education/"
c. Tomcat logs-
Start- CoursesServiceImpl - addCourse - Used Memory:79
add course 704
image file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile@352d57e3
image save to s3 bucket
image folder name images
buckets3 lmsdev-cloudfront/images
image s3 bucket for call
imageUrl https://lmsdev-cloudfront.s3.amazonaws.com/images/704_4_m_Maldives_dest_landscape_l_755_1487.webp
video file not null org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile@45bdb178
video save to s3 bucket
video folder name videos
input Stream java.io.ByteArrayInputStream@3a85dab9
And after few minutes
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection timed out (Write failed)
d. S3 Bucket – No entry
Now tried to upload the same video from our test server, and it was instantly uploaded to S3 bucket.
Reading all posts with similar problems,mostly are related to php.ini configurations and thus not related to us.
CodePudding user response:
I have solved the issue now, MTU set in LXC container was set differently than what was configured in virtual switch. Proxmox does not give to set MTU while creating LXC container (and you expect bridge MTU to be used) and you can miss that.
Go to conf file of container; in my case it is 100 nano /etc/pve/lxc/100.conf
find and edit this line net0: name=eno1,bridge=vmbr4002,firewall=1,hwaddr=0A:14:98:05:8C:C5,ip=192.168.0.2/24,type=veth
to add mtu value, as per switch in towards the last: name=eno1,bridge=vmbr4002,firewall=1,hwaddr=0A:14:98:05:8C:C5,ip=192.168.0.2/24,type=veth,mtu=1400 (my value at vswitch)
Reboot the container for a permanent change.
And all worked like a charm for me. Hope it helps someone who also uses Proxmox interface to create the containers and thus missed this to configure via CLI (a suggested enhancement to Proxmox)