I have a simple NFS server (followed instructions here) connected to a Kubernetes (v1.24.2) cluster as a storage class. When a new PVC is created, it creates a PV as expected with a new directory on the NFS server.
The NFS provider was deployed as instructed here.
My issue is that containers don't seem to be able to perform all the functions they expect to when interacting with the NFS server. For example:
A PVC and PV are created with the following yml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mssql-data
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
This creates a directory on the NFS server as expected.
Then this deployment is created to use the PVC:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mssql-deployment
spec:
replicas: 1
selector:
matchLabels:
app: mssql
template:
metadata:
labels:
app: mssql
spec:
terminationGracePeriodSeconds: 30
hostname: mssqlinst
securityContext:
fsGroup: 10001
containers:
- name: mssql
image: mcr.microsoft.com/mssql/server:2019-latest
ports:
- containerPort: 1433
env:
- name: MSSQL_PID
value: "Developer"
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
value: "Password123"
volumeMounts:
- name: mssqldb
mountPath: /var/opt/mssql
volumes:
- name: mssqldb
persistentVolumeClaim:
claimName: mssql-data
The server comes up and responds to requests but does so with the error:
[S0002][823] com.microsoft.sqlserver.jdbc.SQLServerException: The operating system returned error 1117(The request could not be performed because of an I/O device error.) to SQL Server during a read at offset 0x0000000009a000 in file '/var/opt/mssql/data/master.mdf'. Additional messages in the SQL Server error log and operating system error log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
My /etc/exports file has the following contents:
/srv *(rw,no_subtree_check,no_root_squash)
When the SQL container starts, it doesn't undergo any container restarts but the SQL service within the container appears to get into some sort of restart loop until a connection is attempted and then it throws the error and appears to stop.
Is there something I'm missing in the /etc/exports file? I tried variations with sync, async, and insecure but can't seem to get past the SQL error.
I gather from the error that this has something to do with the container's ability to read/write from/to the disk. Am I in the right ballpark?
CodePudding user response:
The config that ended up working was:
/srv *(rw,no_root_squash,insecure,sync,no_subtree_check)
This was after a reinstall of the cluster. No significant changes elsewhere but still seems like there may have been more to the issue than this one config.