Elevated Image API Errors on us-east-1

Incident Report for Imagizer Cloud

Postmortem

Early this morning, Imagizer Cloud's east coast zone underwent a large-scale outage. As a result, our Image API service was severely degraded.

The outage began at 5 am PST and ended at 7:15 am PST, affecting all east-coast regional customers.

What and why it happened

The cause of the outage was a substantial increase in resource-intensive requests, which gradually increased throughout the early morning.

Unfortunately, the unusual pattern of requests went undetected by our autoscaling services. Once reaching a tipping point, our servers began to reject requests, causing timeouts and other 5XX error responses.

We also failed to react in time to mitigate the outage.

Actions in progress

Minimum Cluster size increased.
Add additional triggers to the autoscaling service
Complete an audit of our autoscaling service
Complete an audit of our in-house monitoring/alerting services
Add third-party monitoring/alerting

‌

We take all outages seriously and always work hard on improving our methods to provide the safest and most stable environment for our clients.

I sincerely apologize for the downtime.

Nicholas Pettas

CTO

Posted Apr 09, 2021 - 18:30 PDT

Resolved

This incident has been resolved.

Posted Apr 09, 2021 - 07:21 PDT

Update

We are continuing to monitor for any further issues.

Posted Apr 09, 2021 - 07:21 PDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Apr 09, 2021 - 07:17 PDT

Identified

We have identified the issue causing the elevated level of Image API errors and are currently implementing a fix.

Posted Apr 09, 2021 - 07:14 PDT

This incident affected: Image API (us-east-1).