This outage was a consequence of yesterday’s Megaport IX ARP storm.
By explanation, we maintain a database cluster with multiple nodes (or servers) designed to failover to their respective nodes in the event of the loss of one of them. As a result of the storm, one of the nodes went offline. While the bulk of the systems failed over to their redundant machines, a service managing part of the PBX did not also fail to their redundant server.
We have identified the cause. The affected component was hardcoded onto the failed server, but it should have dynamically pointed to one of the remaining failover machines via a proxy. The issue has been corrected. It was due to a human error recently created within system maintenance, but it has since been resolved.