Elevated Image API Errors on us-east-1
Incident Report for Imagizer Cloud
Postmortem

Early this morning, Imagizer Cloud's east coast zone underwent a large-scale outage. As a result, our Image API service was severely degraded.

The outage began at 5 am PST and ended at 7:15 am PST, affecting all east-coast regional customers.

What and why it happened

The cause of the outage was a substantial increase in resource-intensive requests, which gradually increased throughout the early morning.

Unfortunately, the unusual pattern of requests went undetected by our autoscaling services. Once reaching a tipping point, our servers began to reject requests, causing timeouts and other 5XX error responses.

We also failed to react in time to mitigate the outage.

Actions in progress

  • Minimum Cluster size increased.
  • Add additional triggers to the autoscaling service
  • Complete an audit of our autoscaling service
  • Complete an audit of our in-house monitoring/alerting services
  • Add third-party monitoring/alerting

We take all outages seriously and always work hard on improving our methods to provide the safest and most stable environment for our clients.

I sincerely apologize for the downtime.

--

Nicholas Pettas

CTO

Posted Apr 09, 2021 - 18:30 PDT

Resolved
This incident has been resolved.
Posted Apr 09, 2021 - 07:21 PDT
Update
We are continuing to monitor for any further issues.
Posted Apr 09, 2021 - 07:21 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Apr 09, 2021 - 07:17 PDT
Identified
We have identified the issue causing the elevated level of Image API errors and are currently implementing a fix.
Posted Apr 09, 2021 - 07:14 PDT
This incident affected: Image API (us-east-1).