Repository cloning failures
Incident Report for Fluid Attacks


An unknown number of groups had problems in the repository cloning process (at UTC-5 23-11-07 05:53 to 23-11-07 12:47 | Time to recover was 4.08 hours). The incident was discovered proactively (at UTC-5 23-11-07 08:46 | Time to detect was 2.8 hours) by a member of the Fluid Attacks team [1] who encountered a No space left on device message in various groups inside the Platform.


There was a change to the type of virtual server instances used by Fluid Attacks to execute some tasks [2]. The new virtual servers that were processing the cloning of repositories have smaller storage space, and due to the architecture implementation of this process, the failure occurred, and the error message was displayed.


Some modifications were made to the architecture of the cloning process, reducing the number of tasks executed concurrently per instance [3].


Currently, there is no existing test for this part of the infrastructure because it is impossible to run it locally or test this kind of change before it goes to production [4]. Now, the product team is working on some changes to improve the cluster's robustness, reproducibility, and observability [5][6]. INFRASTRUCTURE_ERROR < IMPOSSIBLE_TO_TEST

Posted Nov 07, 2023 - 20:26 GMT-05:00

The incident has been resolved and the repository cloning is working normally.
Posted Nov 07, 2023 - 16:35 GMT-05:00
We are continuing to work on a fix for this issue.
Posted Nov 07, 2023 - 14:13 GMT-05:00
Some inconveniences have been identified in the process of repository cloning.
Posted Nov 07, 2023 - 09:24 GMT-05:00
This incident affected: Cloning.