Platform performance degradation
Incident Report for Fluid Attacks
Postmortem

Impact

An unknown number of users found extended response times in Fluid Attacks' Platform (at UTC-5 23-11-09 07:45 to 23-11-09 11:10 | Time to recover was 3.3 hours). The incident was discovered proactively (at UTC-5 23-11-09 07:54 | Time to detect was 9 minutes) by one of our monitoring tools and staff members, indicating times above 5 seconds of web response in this component.

Cause

Our API experienced an abnormal amount of requests. This task did not scale properly, affecting our API's general performance and our Platform and Agent.

Solution

The engineering team lightened the operation by deactivating some tasks as indicators while investigating and optimizing this process [1][2].

Conclusion

The operation had kept that configuration for some time. However, this abnormal stress situation for this task brought the problem to light. Now, the team is adding tests for the performance of API operations in scenarios where a lot of data is loaded, improving the observability [3]. PERFORMANCE_ERROR < MISSING_TEST

Posted Nov 09, 2023 - 16:58 GMT-05:00

Resolved
The incident has been resolved and https://app.fluidattacks.com now has adequate response times.
Posted Nov 09, 2023 - 15:59 GMT-05:00
Identified
Users are experiencing high response times when accessing https://app.fluidattacks.com
Posted Nov 09, 2023 - 07:45 GMT-05:00
This incident affected: Platform.