A few days ago, it was brought to our attention that publishing a server on mocktastic was intermittently failing. One of the impacted users was kind enough to share a lot of information regarding this, which ultimately led to us diagnosing and fixing the issue.
Firstly, we would like to apologize to all our customers who were impacted by this issue. We would also like to thank you for putting up with us, until we managed to fix the issue. We would also like to apologize for the delay in fixing the issue (due to extended holidays in our part of the world).
Here is a timeline of the issue faced:
- Publishing servers was working as intended upto October 11th.
- On October 11th we installed a new API gateway server, in order to better control access to our public APIs.
- Post gateway installation tests with our test data-set indicated everything was working as normal.
- On October 13th we were contacted by a customer facing issues with publishing their server. The publish failed repeatedly with an 'Unknown Error' message.
- On October 14th, we tested with our test data-set again, and found everything to be in working order. Analyzing the API access logs, we saw that the customers API request to publish were rejected with a
413 Request Entity too Largeerror.
- We increased the allowed payload size on the publish endpoint to 50MB, in order to mitigate the issue.
- On October 15th, upon emailing the customer with our fix, customer reported that the issue was still unresolved and reproducible.
- We analyzed the logs again, and found that requests were still being rejected with a 413 error code.
- Further analysis suggested that the request was being dropped before being forwarded to our API server.
- We researched our API gateway documentation, and found that the default max client body size for POST requests had been set too low. We boosted this to 50MB to match our endpoint configuration.
- Customer confirmed that issue had been mitigated and they were able to publish servers again.
What we learned from this
This issue primarily occurred due to our hastiness while setting up and configuring the API gateway. As a third party software, we should have researched the maximum POST body size configuration and ensured that it was set to as large as we required it to be. In future, we will be thoroughly researching every piece of software that we configure in order to prevent similar issues.
A secondary takeaway from this incident was that our test data-set was not covering all possible scenarios. We have since modified our data-set to take into account large payload sizes.
We will continue to write post mortems of any production issues that impact the mocktastic web service and desktop app. Once again, we would like to apologize to our customers for the difficulties caused.
Thank you for your continued faith in us.