On May 23rd, 2025, from `5:50 AM UTC` till `7:03 AM UTC`, after a scheduled deployment, some customers experienced mismatched data responses on their interactions with our services.
Realizing that it was a major issue, we initiated the roll-back procedure, reverting our services to the previous code version.
At `5:50 AM UTC` we concluded a deployment that included an upgrade of a third party provider library. The deployment introduced errors on several sessions.
Some sessions with tokens requested during the affected timeframe might have experienced issues with add to cart functionality and checkout. In particular, adding items to cart and placing orders was not always feasible. In addition to this, some placed orders were missing approved_at and reference data.
A few minutes after the release, one of our operators detected an anomaly on the internal metrics dashboard that was confirmed by our clients requests to the support team.
The upgrade of a third party provider library that manages low level web request wrappers introduced problems in our APIs that have not been detected by our tests in preprod and qa environments. The upgrade was recommended to keep our services up to speed with the latest security patches and compatibility with other components.
A few minutes after the release, once we realized the severity of the issue, we decided to roll-back to the previous version of the code in order to immediately restore normal operations and allow a deeper understanding of the root cause offline. In addition to the roll-back, a partial CDN cache purge was performed to reduce the TTL of the wrong data.
We are introducing additional automatic tests related to the involved component and other similar patterns.