Problem
I am trying to troubleshoot the following message.
time="<timestamp>" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: <uuid>"
Which I get from running kubectl logs external-dns-xxxxxxxxxx-xxxxx
My Question
I am trying to figure out ...
- Where does this message get generated from? I can't understand if its from my service, serviceaccount, clusterrole, clusterrolebinding, the pod, or something else. Any clarification or links to useful explanations would be appreciated. (My guess right now is from the pod, based on the k8s documentation, but I'm still not positive, and I'm not sure how to try tracing it to confirm)
- Why is it that my IAM permissions that I've explicitly specified, are not being assumed by my external-dns? Any explanation on the flow of how my external-dns pod attempts to conduct its permission assumption, or carry out its tasks, would be GREATLY appreciated!
My Goal
I'm pretty new to K8s, and am trying to deploy a EKS cluster w/ external-dns to allow for automated management of my Route53 records.
What I've tried so far
- I've messed around with expanding the IAM permissions, and opened them up as wide as I could.
- I've explicitly added annotations to all of my resources defining the
eks.amazonaws.com/role-arn
- I've tried moving the
external-dns
deployment fromkube-system
todefault
namespace, since that was recommend on a GitHub issue with the same error message.
Deployment Details
I'm using Terraform to deploy most of my EKS cluster, node group, OIDC, & Helm.
For now I've opted to just share the results of the deployment, rather than the configs, in order to try and minimize the size of this question. If you'd like to see the configs just ask and I'll share everything I have.
Kubectl Descriptions
kubectl describe service external-dns
Name: external-dns
Namespace: default
Labels: app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
meta.helm.sh/release-name: external-dns
meta.helm.sh/release-namespace: default
Selector: app.kubernetes.io/instance=external-dns,app.kubernetes.io/name=external-dns
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.233.113
IPs: 172.20.233.113
Port: http 7979/TCP
TargetPort: http/TCP
Endpoints: 10.12.13.93:7979
Session Affinity: None
Events: <none>
kubectl describe serviceaccount external-dns
Name: external-dns
Namespace: default
Labels: app.kubernetes.io/managed-by=Helm
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
meta.helm.sh/release-name: external-dns
meta.helm.sh/release-namespace: default
Image pull secrets: <none>
Mountable secrets: external-dns-token-twgpb
Tokens: external-dns-token-twgpb
Events: <none>
kubectl describe clusterrole external-dns
Name: external-dns
Labels: <none>
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
endpoints [] [] [get watch list]
nodes [] [] [get watch list]
pods [] [] [get watch list]
services [] [] [get watch list]
ingresses.extensions [] [] [get watch list]
gateways.networking.istio.io [] [] [get watch list]
ingresses.networking.k8s.io [] [] [get watch list]
kubectl describe clusterrolebindings.rbac.authorization.k8s.io external-dns
Name: external-dns
Labels: <none>
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
Role:
Kind: ClusterRole
Name: external-dns
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount external-dns default
kubectl describe ingress -n kube-system
Name: aws-lb-ctrlr
Labels: <none>
Namespace: kube-system
Address:
Ingress Class: <none>
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
*
/* aws-load-balancer-controller:80 (<error: endpoints "aws-load-balancer-controller" not found>)
Annotations: alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0
alb.ingress.kubernetes.io/listen-ports: [{'HTTP': 80}]
alb.ingress.kubernetes.io/scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: <my-domain.tld>
kubernetes.io/ingress.class: alb
Events: <none>
kubectl describe pod
Name: external-dns-xxxxxxxxxx-xxxxx
Namespace: default
Priority: 0
Service Account: external-dns
Node: ip-10-12-13-107.ec2.internal/10.12.13.107
Start Time: Tue, 20 Sep 2022 10:48:06 -0400
Labels: app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
pod-template-hash=xxxxxxxxxx
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 10.12.13.93
IPs:
IP: 10.12.13.93
Controlled By: ReplicaSet/external-dns-xxxxxxxxxx
Containers:
external-dns:
Container ID: docker://5b49f49f7b9c0be8cb00835f117eedccaff3d5bb4ebfecb4bc6af771d2b3d336
Image: docker.io/bitnami/external-dns:0.12.2-debian-11-r14
Image ID: docker-pullable://bitnami/external-dns@sha256:195dec0f60c9137952ea0604623c7eb001ece4142916bdfb0cc79f5d9cdc4b62
Port: 7979/TCP
Host Port: 0/TCP
Args:
--metrics-address=:7979
--log-level=debug
--log-format=text
--domain-filter=<my-domain.tld>
--policy=sync
--provider=aws
--registry=txt
--interval=1m
--txt-owner-id=<hosted-zone-id>
--source=service
--source=ingress
--aws-api-retries=3
--aws-zone-type=public
--aws-assume-role=arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
--aws-batch-change-size=1000
State: Running
Started: Tue, 20 Sep 2022 10:48:13 -0400
Ready: True
Restart Count: 0
Liveness: http-get http://:http/healthz delay=10s timeout=5s period=10s #success=1 #failure=2
Readiness: http-get http://:http/healthz delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
AWS_DEFAULT_REGION: us-east-1
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_ROLE_ARN: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d82r7 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
kube-api-access-d82r7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m44s default-scheduler Successfully assigned default/external-dns-xxxxxxxxxx-xxxxx to ip-10-12-13-107.ec2.internal
Normal Pulling 3m43s kubelet Pulling image "docker.io/bitnami/external-dns:0.12.2-debian-11-r14"
Normal Pulled 3m40s kubelet Successfully pulled image "docker.io/bitnami/external-dns:0.12.2-debian-11-r14" in 3.588418583s
Normal Created 3m38s kubelet Created container external-dns
Normal Started 3m37s kubelet Started container external-dns
kubectl describe deployments.apps
Name: external-dns
Namespace: default
CreationTimestamp: Tue, 20 Sep 2022 10:48:06 -0400
Labels: app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
Annotations: deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: external-dns
meta.helm.sh/release-namespace: default
Selector: app.kubernetes.io/instance=external-dns,app.kubernetes.io/name=external-dns
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
Service Account: external-dns
Containers:
external-dns:
Image: docker.io/bitnami/external-dns:0.12.2-debian-11-r14
Port: 7979/TCP
Host Port: 0/TCP
Args:
--metrics-address=:7979
--log-level=debug
--log-format=text
--domain-filter=<my-domain.tld>
--policy=sync
--provider=aws
--registry=txt
--interval=1m
--txt-owner-id=<hosted-zone-id>
--source=service
--source=ingress
--aws-api-retries=3
--aws-zone-type=public
--aws-assume-role=arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
--aws-batch-change-size=1000
Liveness: http-get http://:http/healthz delay=10s timeout=5s period=10s #success=1 #failure=2
Readiness: http-get http://:http/healthz delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
AWS_DEFAULT_REGION: us-east-1
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: external-dns-xxxxxxxxxx (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 9m30s deployment-controller Scaled up replica set external-dns-xxxxxxxxxx to 1
AWS IAM (AllowExternalDNSUpdates)
IAM Role (Trust Relationship)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<userid>:oidc-provider/oidc.eks.region-code.amazonaws.com/id/<oidc-id>"
},
"Action": "sts:AssumeRoleWithWebIdentity"
}
]
}
IAM Policy (Permissions)
{
"Statement": [
{
"Action": "route53:ChangeResourceRecordSets",
"Effect": "Allow",
"Resource": "arn:aws:route53:::hostedzone/*",
"Sid": ""
},
{
"Action": [
"route53:ListResourceRecordSets",
"route53:ListHostedZones"
],
"Effect": "Allow",
"Resource": "*",
"Sid": ""
}
],
"Version": "2012-10-17"
}
CodePudding user response:
Answer
So basically it was two things
- (Credit to @Jordanm, in the comments) The trust relationship was incorrect, I edited the post to fix it and re-ran my configs. My problem then turned into
records retrieval failed: failed to list hosted zones: AccessDenied: User: arn:aws:sts::<userid>:assumed-role/AllowExternalDNSUpdates/1663776911448118272 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::<userid>:role/AllowExternalDNSUpdates\n\tstatus
- Because I had an additional error, I had to go back and fix my terraform helm config, and remove the "assume-role" setting. Basically, if you have the 2nd error, "assumed-role trying to assume-role" then you are just assuming the role twice.