DataDog log ingestion not working on Kubernetes cluster-CodePudding

We followed these instructions to set up DataDog in our Kubernetes 1.22 cluster, using their operator. This was installed via helm with no customisations.

The operator, cluster-agent, and per-node agent pods are all running as expected. We know that the agents are able to communicate successfully with the DataDog endpoint because our new cluster shows up in the Infrastructure List view of DataDog.

However, logs from our application's pods aren't appearing in DataDog and we're struggling to figure out why.

Some obvious things we made sure to confirm:

agent.log.enabled is true in our agent spec (full YAML included below).
our application pods' logs are present in /var/log/pods/, and contain the log lines we were expecting.
the DataDog agent is able to see these log files.

So it seems that something is going wrong in between the agent and the logs being available in the DataDog UI. Does anyone have any ideas for how to debug this?

Configuration of our agents:

apiVersion: datadoghq.com/v1alpha1
kind: DatadogAgent
metadata:
  name: datadog
  namespace: datadog
spec:
  agent:
    apm:
      enabled: false
    config:
      tolerations:
        - operator: Exists
    image:
      name: "gcr.io/datadoghq/agent:latest"
    log:
      enabled: true
    process:
      enabled: false
      processCollectionEnabled: false
  clusterAgent:
    config:
      admissionController:
        enabled: true
        mutateUnlabelled: true
      clusterChecksEnabled: true
      externalMetrics:
        enabled: true
    image:
      name: "gcr.io/datadoghq/cluster-agent:latest"
    replicas: 1
  clusterChecksRunner: {}
  credentials:
    apiSecret:
      keyName: api-key
      secretName: datadog-secret
    appSecret:
      keyName: app-key
      secretName: datadog-secret
  features:
    kubeStateMetricsCore:
      enabled: false
    logCollection:
      enabled: true
    orchestratorExplorer:
      enabled: false

Here are the environment variables for one of the DataDog agents:

DD_API_KEY : secretKeyRef(datadog-secret.api-key)
DD_CLUSTER_AGENT_AUTH_TOKEN : secretKeyRef(datadog.token) 
DD_CLUSTER_AGENT_ENABLED : true
DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME : datadog-cluster-agent
DD_COLLECT_KUBERNETES_EVENTS : false
DD_DOGSTATSD_ORIGIN_DETECTION : false
DD_DOGSTATSD_SOCKET : /var/run/datadog/statsd/statsd.sock
DD_EXTRA_CONFIG_PROVIDERS : clusterchecks endpointschecks
DD_HEALTH_PORT : 5555
DD_KUBERNETES_KUBELET_HOST : fieldRef(v1:status.hostIP)
DD_LEADER_ELECTION : false
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL : false
DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE : true
DD_LOGS_ENABLED : true
DD_LOG_LEVEL : INFO
KUBERNETES : yes

CodePudding user response：

if you are able to see metrics, then for logs I can see two possible reason

enable logs collection during helm installation

helm upgrade -i datadog --set datadog.apiKey=mykey datadog/datadog --set datadog.logs.enabled=true

Wrong region configuration, by default it expects US.

helm upgrade -i datadog --set datadog.apiKey=my-key datadog/datadog --set datadog.site=us5.datadoghq.com

If these two are correct, make sure the pod write logs stdout/sterror

as the log path seems correct as default

         - name: logpodpath
            mountPath: /var/log/pods
            mountPropagation: None

Apart from that, you also need to white list the container list to collect log from, or you can set the below ENV to true and it should work and collect all logs.

DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true