General availability issues

Incident Report for Fluid Attacks

Postmortem

Impact

An unknown number of users found the Fluid Attacks Platform, the API and the Agent unavailable. The issue started on UTC-5 24-03-05 19:47 and was proactively discovered 6 minutes (TTD) later by one of our monitoring tools, indicating the total outage of these components. The problem was resolved in 10 minutes (TTF) resulting in a total impact of 16 minutes (TTR).

Cause

During infrastructure preparation for IPv6 utilization, a routing change was made for our cloud environment, documented in this issue [1] and implemented through this MR [2]. However, AWS did not immediately reflect this change, causing unavailability across systems due to synchronization issues. This delay in routing change implementation led to network subnets losing internet access.

Solution

The team redeployed the change to synchronize it in the cloud, enabling instances to regain internet access.

Conclusion

Given that the issue stemmed from an AWS error, it was impossible to test beforehand, However, moving forward, we will take into account potential latencies when applying routing changes within the AWS network. While it may be challenging to completely prevent outages due to this, the priority of addressing such incidents is relatively low, considering these components are rarely modified. INFRASTRUCTURE_ERROR < IMPOSSIBLE_TO_TEST

Posted Mar 06, 2024 - 18:25 GMT-05:00

Resolved

The issue was resolved and now all Fluid Attacks products are fully available.

Posted Mar 05, 2024 - 22:24 GMT-05:00

Identified

Some Fluid Attacks products are experiencing availability issues. Click for details: https://availability.fluidattacks.com

Posted Mar 05, 2024 - 19:53 GMT-05:00

This incident affected: Platform, Agent, and Extensions.