Web Services are something that are either used as provider or client by almost every developer. Web services are usually exposed via SOAP, these days RESTful web services are common too. One thing common in all types web services is HTTP protocol, its the base protocol wrapped by SOAP / REST implementations. So all web services are prone to HTTP/Internet/LAN issues and other web service providers issues. This post tries to explain which of these errors/issues should be retried by client code before failing fast.
Development Mode Approach – A fairy tale !
When we are developing Client side code for a web service. We go though flows that rarely fail for Internet/LAN errors or other WS provider mistakes. So we never think about retrying on such failures, we also had a feeling when solution will run on production on A grade super fast internet wires, these issues will not be there. But this fairy tale turns nightmare, when in production we get strange errors that are completely unexpected and hard to reproduce and debug.
Realistic Approach – Lesser Production Nightmares !
So the life is not like a fairy tale, so we should RETRY for certain errors on web service calls. Though this topic is pretty wide and its not easy to cover all known different issues so I tried to pick common error codes and failure reasons for two popular clouds web service providers i.e. Salesforce and Amazon.
Below is a table that lists these error codes and explains which of them should be retired by a web service client side code.
|Error||Code||Retry||“Why” or “Why Not” Retry ?|
|Unknown Host||YES||Unknown host might come because of temporary network issues, we should wait and try to reconnect for those.|
|Service Unavailable||503||YES||Pushing updates or maintenance window is not too long and is usually known for providers so we should wait and hold on for the known period.|
|Temporary Redirect||307||YES||As said its a temporary redirect, so we can retry on the same endpoint again.|
|Request Timeout||400||YES||Request can timeout because of network issues or because you are querying too much with the web service. If the failure is because of network issues, we should retry, otherwise one should try tuning the web service request to reduce the queried data.|
Internal Errors at Server Side
To RETRY or NOT, depends on the web service providers. Many providers like Amazon document which internal errors to retry. For others without such documentation we should try to wait for a while and then retry.
|409||NO||We should try optimizing the client code to ensure proper locking, so that multiple threads don’t race against the same resource.|
|503||NO||We should fix client to slow or queue requests. Another cool option provided by many providers is ability to batch multiple requests. So those options should be tried client side.|
Most of the web service providers give a login token in form of Keys, Session Ids etc. Sometimes these tokens have limited life, so client code should try renewing these tokens on such errors.
|Bad Request/Digest, Incomplete Body, Invalid Argument, Malformed XML, |
Malformed POST Request, Missing Content Length,
|400||NO||Client code should be fixed to form correct requests |
|Access Denied||403||NO||Try different credentials|
|Wrong End Point, PermanentRedirect||301 / 400||NO||Client code is trying incorrect end point. The URL of end point should be fixed.|
|MethodNotAllowed||405||NO||Client code should be fixed.|
In general one can follow some simple rules based on HTTP status codes too. Though these rules are not applicable to all web service providers, but most of the time you will end up in taking right RETRY or NOT decision. The table below explains this
|HTTP STATUS CODE||MEANING ?||RETRY ?||“Why” or “Why Not” Retry ?|
|307||Moved Temporarily||YES||Retry after a while.|
|400||Bad Request||NO||Needs to fix client side code to form Request correctly.|
|403||Forbidden||NO||Needs to fix the credential in client side code.|
|405||Method Not Allowed||NO||Need to fix the HTTP call to use correct method.|
|409||Conflict||YES||Client is racing for same resource, have some locking in client code to ensure not requesting conflicting actions on same resource. If locking is not possible wait for a while and retry.|
|411||Length Required||NO||Client must provide the Content-Length HTTP header.|
|500||Internal Server Error||YES||The client side code is correct, some thing on web service provider’s side failed. So retry after a while.|
|501||Not Implemented||NO||Client is trying to use a functionality that is not yet implemented|
|503||Service Unavailable||YES||You may retry here, if service is down for a while. Like for Maintenance.|
Open Source Project Coming Soon !
I am about to release an open source project for Salesforce and Amazon that helps you write Retryable client side code easily. Stay connected, I will post updates.