When things break: failover, fallback, diagnostics

cast logo

When things break: failover, fallback, diagnostics and the troubleshooting checklist

Lighting networks fail. Cables come loose, switches reboot, consoles freeze, gateways overheat. A protocol that doesn't think hard about failure is a protocol that ruins shows. Sig-Net thinks about failure in three different ways, plus a generous diagnostic vocabulary so you can work out what went wrong afterwards.

Stream loss vs. failover vs. fallback

These three words sound interchangeable but mean different things in Sig-Net.

Stream loss is what triggers the others. If an endpoint is patched to a universe and hasn't received a valid level packet for three seconds, it considers the stream lost. That timer is independent of polling — even if the management network is fine, a quiet Sender will trigger stream loss.

Failover is what the endpoint does when the stream goes quiet. It's configurable per endpoint, with five options: hold the last look, blackout (all slots to zero), full (all slots to 255), play an internal scene the device has stored, or stop generating DMX entirely. Pick the one that makes sense for the rig — most house rigs prefer "hold last look", architectural installs sometimes want "play scene", and concert touring almost always wants "blackout" so a dropped stream is obvious to the operator.

Fallback is a special direction setting on a port. A port in fallback mode normally listens to physical DMX coming in (it's acting as an input), but if that DMX line goes dead, the port automatically flips to becoming an output and drives the Sig-Net stream onto the local DMX. This is how you do severed-cable redundancy with a single gateway port: the port is an input until something cuts the cable, then it instantly becomes a backup output. No reconfiguration required.

Diagnostics on the wire

Sig-Net Nodes maintain a small set of diagnostic counters and proactively shout about them when things go wrong.

Security events — TID_DG_SECURITY_EVENT — track HMAC verification failures, replay attack attempts, denial-of-service rate-limiting, and unauthorised onboarding attempts. Each event has a counter (so you can see whether something happened once or is happening continuously) and the IP address of the most recent offending packet. Treat the IP as a hint, not gospel — UDP source addresses can be trivially forged — but for diagnosing a broken cable or a misconfigured device, it's invaluable.

Diagnostic messages — TID_DG_MESSAGE — are free-form human-readable strings the Node fires when something internal happens. "Fan failed", "over-temperature", "DMX UART error". They're aggressively rate-limited so a chattering fault can't flood the network.

Level foldback — TID_DG_LEVEL_FOLDBACK — is a beautiful piece of diagnostic kit. On request, a Node hands you back the actual 512-byte DMX buffer it's currently driving on a specific endpoint. So when an operator says "this fixture is at 50% and I don't know why", you can ask the gateway directly what value it's outputting and trace the merge from there.

A few things to remember when troubleshooting

If a new light won't take any commands, it's probably offboarded — feed it the K0 passphrase via its front panel.

If a visiting console can see the rig but every patch change fails, it's been given Guest Manager keys; either ask the house engineer to do the patch or get the keys upgraded to Equal.

If you change an IP and a gateway disappears, wait 60 seconds — the rollback timer should bring it back to its old settings. If it doesn't, you went offline before the Manager confirmed the change.

If you can't wipe a Node remotely, the 5-minute power-up lockout has expired. Power-cycle the device and send the wipe within 5 minutes.

Next post: how house venues, touring consoles, main desks and backup desks all share one Sig-Net network without stepping on each other.


This series is based on v0.20 of the Sig-Net spec - visit Sig-Net.net for any updates.