Troubleshooting
There might be situations where things might go wrong, this section is providing tips to debug issues in your cluster and your applications deployed with quarks-operator
.
You should also check the known issues for the release you are using.
Also, checkout these awesome tools that can help you while debugging your cluster.
Kubernetes events
The quarks-operator
streams results of its actions throughout logs and Kubernetes events. If you notice issues, or if you are just unsure on what is happening, check first the event logs of all the namespaces (or the namespace where you are deploying your bosh release)
|
|
Checking logs
Catching logs from all the quarks-operator
components might be somehow challenging, we suggest to use stern.
For example, to stream all the logs of all pods in your cluster in your terminal:
|
|
With kubectl, you can check all container logs with:
|
|
Debugging BOSH Releases
The quarks operator exposes a debug
parameter to enable hooking into a bosh release before starting. In this way you can open a shell in a pod for further debugging.
To do so, you have to specify quarks.debug=true
to the relevant part you are interested in debugging, for example, in your Configmap:
|
|
Checking containers state
We strongly suggest to use k9s. It gives a quick glance over your cluster state, and quickly allows to introspect pods container statuses.
In most cases, while you are debugging a BOSH deployment you will have to check the states of all the containers running in a pod.
You can do that in k9s by navigating into the pod and pressing [Enter] , in plain kubectl commands, you can describe a resource to get all its events and in case of pods, an overview of all the containers belonging to it
|
|
or, if you are checking a QuarksSecret
state:
|
|
My pod looks stuck
There might be various reasons why this could happen, here are few suggestions:
- Check if your pod is having a container with the
wait-for
prefix, means it’s waiting for a specific service which can be identified by the container name. - Check if your pod is depending on a secret which isn’t being generated yet. If the secret is generated by a
QuarksSecret
, check the events related to theQuarksSecret
which should generate it (withkubectl describe ...
, orkubectl events
).
Cluster CA
The quarks-operator
assumes that the cluster root CA is also used for signing CSRs via the certificates.k8s.io API and will embed this CA in the generated certificate secrets. If your cluster is set up to use a different cluster-signing CA the generated certificates will have the wrong CA embedded. See https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/ for more information on cluster trust.
Recovering from a crash
If the operator pod crashes from unrecoverable errors, it cannot be restarted in the same namespace before the existing mutating webhook configuration for that namespace is removed. The operator uses mutating webhooks to modify pods on the fly and Kubernetes fails to create pods if the webhook server is unreachable. The webhook configurations are installed cluster wide and don’t belong to a single namespace, just like custom resources.
To remove the webhook configurations for the quarks-operator namespace run:
|
|
From Kubernetes 1.15 onwards, it is possible to instead patch the webhook configurations for the cf-operator namespace via:
|
|
See also how to monitor admission webhooks if you are debugging the changes introduced by a Quarks operator
webhook