Impact
An unknown number of users found increased response times when accessing the Platform (at UTC-5 23-11-08 12:20 to 23-11-08 14:43 | Time to recover was 43 minutes). The incident was discovered proactively (at UTC-5 23-11-08 13:59 | Time to detect was 1.6 hours) by one of our monitoring tools and staff members, indicating times above 5 seconds of web response in this component.
Cause
In favor of improving technologies that allow simplifying and adding value, a new experimental library was being tested in the authorization module [1]. When deploying a new version of the Platform, the change caused a performance downgrade, leading to increased response times.
Solution
The engineering team reverted the Platform version to the previous one without the changes of the experimental library [2].
Conclusion
A satisfactory and accurate way to measure performance degradation before reaching production has not yet been implemented. The engineering team is still investigating ways to have these measures in place to keep such situations under control. IMPOSSIBLE_TO_TEST