Intermittent service disruptions in Automated Software Integration
Incident Report for Fluid Attacks


An unknown number of users encountered difficulties with the Automated Software integration due to machine scaling issues in the CI. The issue started on UTC-5 24-02-06 08:00 and was proactively discovered 1.1 days (TTD) later by the product team during their regular workflow. The problem was resolved in 2.8 hours (TTF) resulting in a total impact of 1.2 days (TTR). [1].


The workers' disks were nearly full, leading to issues when the worker’s operating systems slightly increased in size.


The size of the workers' disks was increased [2].


Gradually, the workers' operating systems grew beyond the disk capacity, leading to the issue. Increasing disk size by 50% mitigates future occurrences. IMPOSSIBLE_TO_TEST

Posted Feb 08, 2024 - 14:25 GMT-05:00

The incident has been resolved, and the CI is now operating normally.
Posted Feb 07, 2024 - 17:22 GMT-05:00
Still investigating the root cause.
Posted Feb 07, 2024 - 15:19 GMT-05:00
The assigned developer is investigating the root cause.
Posted Feb 07, 2024 - 13:23 GMT-05:00
The CI is experiencing slowdowns and intermittent unavailability.
Posted Feb 07, 2024 - 11:45 GMT-05:00
This incident affected: Dependencies (AWS ec2-us-east-1).