Deployment Troubleshooting

This guide covers common issues administrators may encounter when deploying Nullafi Shield and provides actionable steps to resolve them.

Container Fails to Start

Symptoms: docker-compose up -d completes but the Shield container exits immediately or never reaches a healthy state.

Possible causes and solutions:

Missing mandatory environment variables — Shield requires several environment variables to start. Check the container logs for errors referencing missing configuration:
```
docker-compose logs shield-web-ui
```
Review the Deployment Option Reference and ensure all required variables are present in your .env file or compose definition.
Invalid license key — If NULLAFI_LICENSE_KEY is malformed or expired, the container will exit. Verify the key value is correct, or if using a license file, confirm the volume mount path matches what is specified in the compose file.
Image not pulled — If the image is not available locally and the registry is unreachable, Docker will fail silently or use a stale image. Run docker-compose pull before docker-compose up -d to ensure the latest image is present.

Cannot Access the Admin Web Console

Symptoms: Browser shows a connection refused, timeout, or SSL error when navigating to the Shield hostname.

Possible causes and solutions:

Ports 80/443 already in use — Another process (e.g., nginx, Apache, or another container) may already be listening on those ports. Check with:
```
sudo ss -tlnp | grep -E ':80|:443'
```
Either stop the conflicting service or remap Shield's ports in the compose file.
DNS not resolving to the correct host — The value of NULLAFI_HTTP_CUSTOM_DOMAIN must resolve to the IP of the host running the shield-web-ui container. Verify DNS resolution from an admin workstation:
```
nslookup <your-shield-domain>
```
ACME/Let's Encrypt certificate not issued — If NULLAFI_HTTPS_ENABLE_ACME is set, Let's Encrypt must be able to reach port 80 on your domain during the challenge. Ensure the host is publicly reachable on port 80. Check container logs for ACME errors:
```
docker-compose logs shield-web-ui | grep -i acme
```
If the host is not publicly accessible, disable ACME and provide your own certificate instead.
Firewall blocking inbound traffic — Confirm the host firewall (e.g., ufw, iptables, cloud security groups) allows inbound TCP on ports 80 and 443.

Configuration Database (Redis) Unreachable

Symptoms: Shield containers start but show errors related to configuration not loading, or the Admin Console shows missing policy/settings.

Possible causes and solutions:

Redis container not running — Check:
```
docker-compose ps
```
If the redis service is not Up, inspect its logs:
```
docker-compose logs redis
```
Incorrect Redis connection settings — The Shield containers must be able to reach Redis on TCP port 6379 (default). Verify NULLAFI_REDIS_HOST and NULLAFI_REDIS_PORT match the actual Redis endpoint. In a multi-host deployment, confirm network routing between hosts allows this traffic.
Docker network misconfiguration — Containers in the same compose file share a default network. If you customized the shield_net network name, ensure all services reference the same network. Verify with:
```
docker network ls
docker network inspect shield_net
```

Activity Database (Elasticsearch) Unreachable

Symptoms: Activity log is empty, Shield logs show Elasticsearch connection errors, or alerts are not firing.

Possible causes and solutions:

Elasticsearch container failed to start — Elasticsearch has its own memory requirements. A common cause of failure is the host's virtual memory limit being too low. Check:
```
docker-compose logs activity
```
If you see max virtual memory areas vm.max_map_count [65530] is too low, fix it with:
```
sudo sysctl -w vm.max_map_count=262144
```
To make this persistent, add vm.max_map_count=262144 to /etc/sysctl.conf.
Insufficient disk space — Elasticsearch requires adequate disk space to store index data. Check available space on the host with df -h and ensure the nullafi-activity volume has room to grow.
Incorrect Elasticsearch connection settings — Verify that NULLAFI_ELASTICSEARCH_HOST (default port TCP 9200) points to the correct host and is reachable from all Shield nodes.

ICAP Server Not Receiving Traffic

Symptoms: Traffic passes through the proxy but Shield never scans it; the Activity log remains empty.

Possible causes and solutions:

Port 1344 not reachable from the proxy — The proxy must be able to open TCP connections to the Shield ICAP node on port 1344 (or 11344 for Secure ICAP). Verify connectivity:
```
nc -zv <shield-icap-host> 1344
```
Check firewall rules on the Shield host and any network devices between the proxy and Shield.
ICAP server mode not configured — The Shield container acting as the ICAP server must have NULLAFI_SERVERMODE set to icap (or both). Confirm the compose service definition for shield-icap.
Proxy ICAP integration misconfigured — Review the proxy's ICAP configuration. The ICAP service URL should point to:
```
icap://<shield-icap-host>:1344/shield
```
Consult your proxy documentation (e.g., Squid, Zscaler) for the correct configuration format.
ICAP node cannot reach the databases — In a distributed deployment, each ICAP node must have network access to both the Redis (TCP 6379) and Elasticsearch (TCP 9200) endpoints. Validate connectivity from the ICAP host.

SSL / HTTPS Traffic Not Being Inspected

Symptoms: HTTP traffic is scanned but HTTPS traffic passes through unmodified.

Possible causes and solutions:

Proxy not performing TLS interception (MITM) — Shield only inspects traffic that the proxy decrypts and forwards. The proxy must be configured for SSL/TLS inspection using a certificate trusted by client devices. Refer to your proxy's documentation.
Client devices do not trust the proxy's CA certificate — If clients see SSL errors, distribute and install the proxy's CA certificate to the trusted store on client machines or via MDM/GPO.
Proxy not configured as an ICAP client for HTTPS responses — Some proxies require separate ICAP rules for HTTP and HTTPS traffic. Ensure ICAP is enabled for both request and response modification on HTTPS traffic.

Product Update Fails or Rolls Back

Symptoms: After running docker-compose pull && docker-compose up -d, Shield behaves unexpectedly or reverts to the old version.

Possible causes and solutions:

Old image still cached — Docker may continue to use the previously pulled image. Confirm the new image is in use:
```
docker-compose images
```
Remove old images with:
```
docker image prune
```
Registry authentication failure — If the image registry requires credentials, a silent pull failure means Docker uses the cached image. Check for auth errors in docker-compose pull output and re-authenticate with docker login.
Database schema incompatibility — In rare cases, a new Shield version may require a data migration. Check the release notes for any pre-upgrade steps before running the update.

Multiple Shield Instances Cannot Communicate

Symptoms: In a multi-instance deployment, ICAP nodes do not reflect policy changes made in the Admin Console, or the Admin Console does not show ICAP nodes as connected.

Possible causes and solutions:

All instances not pointing to the same Redis — Every Shield node (web, icap, alert) must share the same NULLAFI_REDIS_HOST. Verify the .env or environment variable values on each host.
&all-shields YAML anchor mismatch — If you use the YAML block anchor pattern from the sample compose files, ensure the settings under &all-shields are identical across your compose files for all hosts.
NULLAFI_ICAP_NAME collision — If two ICAP nodes share the same NULLAFI_ICAP_NAME, they may overwrite each other's registration. Assign a unique name to each ICAP node.

General Diagnostic Steps

When facing any unexpected issue, the following steps help narrow down the root cause:

Check container status:
```
docker-compose ps
```

Read container logs:

docker-compose logs --tail=100 <service-name>

Verify environment variables are set correctly:
```
docker-compose config
```
This prints the resolved compose configuration, showing the actual values after variable substitution.

Test inter-service connectivity from within a container:

docker exec -it <container-name> sh
nc -zv <target-host> <port>

Check host resource usage — Low memory or disk space can cause containers to crash or behave erratically:
```
free -h
df -h
```

If the issue persists after following these steps, contact Nullafi support with the output of docker-compose logs and a description of your deployment topology.