Quickly recover from failures using automated SQS DLQ redrive

Two years ago at re:Invent, AWS announced DLQ redrive to move messages from a dead letter queue (DLQ) to the original queue. Previously, this feature was only available using the AWS console. Now, DLQ redrive is available using AWS SDK and CLI.

The new feature provides the ability to automatically handle invalid messages in DLQs and increase the reliability in asynchronous applications.

This sample architecture includes a Lambda function that sends messages to a third party API. If the third party API becomes unavailable, the processing fails and the messages are sent to the DLQ. Previously, operation teams had to manually run the DLQ redrive in the AWS console. Lambda functions using AWS SDK can help to automate this process.

How can you automate this process? Implement the logic from the diagram:

  1. The SQS queue sends failed messages to the DLQ
  2. A time-based EventBridge event triggers a lambda function periodically (e.g. every 5 minutes)
  3. The lambda function checks whether the DLQ contains failed messages. If there are failed messages, it calls a health check of the third party backend.
  4. If the health check is successful, the lambda function starts the DLQ redrive using the AWS SDK method StartMessageMoveTask.

If the third party backend still fails, the process is repeated after five minutes until the DLQ is empty again.

For more details on the new API calls, read the official AWS blog post.






Leave a Reply

Your email address will not be published. Required fields are marked *