Roots cloning is frozen
Incident Report for Fluid Attacks
Postmortem

Impact

At least 4 organizations experienced a delay in cloning their repositories (UTC-05 23-08-29 17:14 to 23-08-31 13:30: 2 days -time to recover-). The incident was detected reactively (at UTC-5 23-08-31 11:36: 2 days -time to detect-) by several users who noted multiple issues related to cloning and reported the issue through our help team [1].

Cause

The merge request at [2] triggered an issue where the servers tasked with processing these jobs sent requests to a non-existent SQS queue. This, in turn, led to an exception being raised, rendering the server unresponsive and unable to process tasks, ultimately resulting in a buildup of pending tasks.

Solution

The engineering team reverted the commit that introduced the problem [3].

Conclusion

It is imperative to adeptly handle exceptions within the server tasked with managing operations, ensuring the seamless continuation of other processes and preventing undesirable freezing of the system. UNHANDLED_EXCEPTIONS

Posted Sep 18, 2023 - 18:57 GMT-05:00

Resolved
The problem has been solved and the clones are working correctly.
Posted Aug 31, 2023 - 13:30 GMT-05:00
Identified
Repositories have been seen whose cloning has been frozen for a long time.
Posted Aug 31, 2023 - 11:36 GMT-05:00
This incident affected: Roots Cloning.