At least three users experienced indefinite waiting times when generating executive, technical and export reports (UTC-05 23-07-31 09:43 to 23-08-01 12:34: 1.1 days -time to recover-). The incident was detected reactively (at UTC-5 23-08-01 09:03: 0.9 days -time to detect-) by a user who reported through our helpdesk [1] that he generated the reports through the platform, but they never arrived in his email.
A job definition in our infrastructure code was deleted under the impression that it was not in use [2]. However, this job turned out to be crucial for the batch processing that produces the reports. Its absence prevented the reports from being generated, as they were left waiting for the job to be deployed.
The engineering team added the job definition and executed again the generation of the queued reports [3].
An important job for the generation of reports was eliminated, partly because its relevance in the infrastructure was not detected and there were no tests to identify the problem before going to production. MISSING_TEST