Vault Agent Injector on Talos: What I Learned#
If you have been following the TazLab cluster story so far, you know the path toward dynamic secrets has been a slow but methodical advance. First, migrating static secrets from Infisical to Vault. Then, exposing services on the tailnet with the Tailscale Operator. Then, the first steps toward dynamic secrets: JWT auth, the database engine, and a PKI incident that destroyed the cluster.
The last mile was the Vault Agent Injector: the component that lets pods receive credentials directly from Vault without going through a Kubernetes Secret. Deploy the injector, configure JWT authentication, and migrate Grafana from a static ExternalSecret to dynamic PostgreSQL credentials.
It seemed straightforward. It was not.
The Most Subtle Bug: Tailscale on Linux Does Not Accept Routes#
The first problem appeared before I even touched the injector. Vault, running on a Hetzner VM inside a Podman container, needed to reach the cluster’s JWKS endpoint to validate pod JWTs. I had exposed the API server on the tailnet with a LoadBalancer Service. The VIP was 100.110.87.98. From Vault, unreachable.
tailscale ping 100.110.87.98 returned: no matching peer.
I checked the ACLs: rule tag:tazlab-vault → tag:k8s:6443 present. I checked the proxy pods: Running. I checked the Tailscale device: registered. Everything looked correct, but nothing worked.
The cause was a Tailscale default on Linux that I did not know about. On Windows, macOS, and Android, Tailscale automatically accepts routes announced by other nodes. On Linux, it does not. The --accept-routes flag is false by default, and Anycast VIPs from Tailscale Services — the technology behind LoadBalancers and ProxyGroups — are not routable without it.
The fix was a single command on the Hetzner VM:
sudo tailscale set --accept-routes=true --accept-dns=trueOne flag. Hours of debugging. The lesson is clear: if you use Tailscale on Linux and service VIPs are unreachable, check --accept-routes first.
ProxyGroup kube-apiserver: Better Than the Previous Solution#
The LoadBalancer Service I had created to expose the API server was replaced by a ProxyGroup of type kube-apiserver. It is the official Tailscale pattern for exposing the control plane, and it supports native HA with TLS via Let’s Encrypt.
apiVersion: tailscale.com/v1alpha1
kind: ProxyGroup
metadata:
name: lushycorp-apiserver-proxy
namespace: tailscale
spec:
type: kube-apiserver
replicas: 2
tags: ["tag:k8s"]
kubeAPIServer:
mode: noauthThe ProxyGroup also solved a tag problem: the old LoadBalancer used tag:k8s, but the ACL rule pointed to tag:k8s:6443, which did not work with the new grants mechanism required by Tailscale Services. With the ProxyGroup, everything aligned: grants tag:tazlab-vault → tag:k8s:443 and autoApprovers for services.
vault-k8s Bug #660: When the Injector Generates the Wrong Config#
With the cluster ready and the JWKS endpoint reachable, I deployed the Vault Agent Injector via Helm:
# Chart hashicorp/vault, injector-only mode
# server.enabled=false, global.externalVaultAddr setI created a test pod with JWT authentication annotations. The vault-agent-init init container failed with a cryptic error:
Error creating jwt auth method: missing 'path' valueThe config generated by the injector was:
{
"auto_auth": {
"method": {
"type": "jwt",
"mount_path": "auth/jwt",
"config": {
"role": "smoketest",
"token_path": "/var/run/secrets/..."
}
}
}
}The problem is subtle but devastating: for the kubernetes auth method, the parameter is called token_path. For the jwt method, the mandatory parameter is called path. The injector (vault-k8s v1.7.2) generates token_path in both cases, so for JWT the path parameter is missing and the configuration is rejected.
It is a known bug, tracked as hashicorp/vault-k8s issue #660. The workaround is to force the path parameter with an explicit annotation:
vault.hashicorp.com/auth-config-path: "/var/run/secrets/kubernetes.io/serviceaccount/token"
vault.hashicorp.com/auth-config-remove-jwt-after-reading: "false"Without the first one, the path is not generated. Without the second one, on read-only filesystems (like Talos), the agent tries to delete the JWT after reading it — and crashes.
Multi-Issuer: Why bound_issuer Was Wrong#
With the workaround applied, authentication reached Vault but was rejected:
error validating token: invalid issuer (iss) claimThe Kubernetes ServiceAccount token contains an iss field declaring who issued the token. My Vault bound_issuer was configured as https://lushycorp-k8s.magellanic-gondola.ts.net:6443. But the token contained:
"iss": "https://lushycorp-k8s.magellanic-gondola.ts.net:6443,https://kubernetes.default.svc.cluster.local"Two issuers, concatenated. This is Talos multi-issuer: the API server’s service-account-issuer configuration accepts a comma-separated list, and the resulting JWT includes ALL issuers in the iss field. Vault must match that exact string — comma included.
The Sidecar That Would Not Work: PodSecurity, Read-Only FS, and runAsUser#
With JWT authentication working and the database engine configured, the smoke test passed: the init container wrote credentials to /vault/secrets/ and the pod started. But the vault-agent sidecar — the one meant to stay running to renew tokens — crashed immediately.
The cause was twofold. First, Talos enforces PodSecurity restricted, which requires an explicit runAsUser on every container. Grafana did not have one, and the agent-run-as-same-user: "true" annotation failed because the main container’s UID was nil. Second, the sidecar tries to write cache files to /tmp, but on Talos some mounts are read-only.
I fixed it by adding securityContext.runAsUser: 1000 to the Grafana container and the annotations to align the sidecar’s UID:
grafana:
containerSecurityContext:
runAsUser: 1000
runAsGroup: 1000
podAnnotations:
vault.hashicorp.com/agent-run-as-user: "1000"
vault.hashicorp.com/agent-run-as-group: "1000"
vault.hashicorp.com/agent-share-process-namespace: "true"This pattern is required for ALL injected pods on Talos.
Podman DNS: A Boot-Time Race Condition#
Vault runs in a Podman container on the Hetzner VM. During the session, I had manually fixed the container’s DNS by adding nameserver 100.100.100.100 to /etc/resolv.conf. But after a reboot, the problem would reappear.
The cause? Vault’s systemd unit did not depend on tailscaled.service, so the container started before Tailscale had updated the system resolver. At boot, the container saw Hetzner DNS, not Tailscale DNS.
I patched the systemd unit and updated the Ansible template:
[Unit]
After=network-online.target tailscaled.service
Wants=network-online.target tailscaled.service
[Service]
ExecStartPre=/bin/sh -c 'for i in $(seq 30); do tailscale status >/dev/null 2>&1 && break; sleep 1; done'Now the container waits for Tailscale to be ready before starting.
Grafana Without ExternalSecret: The Clean Migration#
The final phase was migrating Grafana from a static ExternalSecret (with a PostgreSQL password in a Kubernetes Secret) to dynamic credentials via the Vault Agent Injector.
The final configuration requires:
- Injection annotations (JWT auth, path, template)
- Env vars
GF_DATABASE_USER__FILEandGF_DATABASE_PASSWORD__FILEreading from/vault/secrets/ grafana.ini.database.user: ''to use only the env vars- Removal of the ExternalSecret and secrets.yaml
One unexpected obstacle: the kube-prometheus-stack Helm deployment does not expose podAnnotations directly. The values passed to the chart are processed, but the correct key is grafana.podAnnotations, not grafana.annotations. Additionally, the Consul Template for the annotation ({{- with secret ...}}) uses double braces that Helm interprets as Go template syntax. The solution: single-quote YAML around the template.
What I Learned#
–accept-routes on Linux is not optional. If you use Tailscale Services (ProxyGroup or LoadBalancer), every Linux client must explicitly accept routes. Desktop and mobile do it automatically; Linux does not.
vault-k8s v1.7.2 has a bug with JWT. The
token_pathparameter does not exist for the JWT method. It expectspath. Until HashiCorp fixes issue #660, two workaround annotations are required.Multi-issuer is not optional on Talos. If you configure
service-account-issuerwith more than one issuer, the JWT will contain all of them concatenated. Vault must match exactly.Talos + sidecar does not work without runAsUser. PodSecurity restricted requires an explicit UID. Align the main container’s UID and the sidecar’s UID with
agent-run-as-userandcontainerSecurityContext.Podman + Tailscale DNS is a race condition. If the container starts before Tailscale, the DNS is wrong. The fix: explicit dependency in the systemd unit.
Helm + Consul Template is a known problem. Double braces in Helm values conflict. Single-quote YAML works and is the most common pattern in the community.


