How to search and find related logs in AWS Log Groups

Hiring

We are Binx. We make every organization cloud-native.

You can search across log groups by setting up a correlation id on your log lines. You can then use CloudWatch Logs Insights to query all related log messages.

If you are dealing with different lambda functions in a project, you might not have a logging system yet. Finding and troubleshooting using Cloudwatch Logs could become cumbersome. Especially if you try to track a flow through the different log groups.

In CloudWatch Logs Insights you can execute queries across log groups. In our case we want to collect related log lines across log groups. To achieve this we need to annotate the log lines with a correlation_id that ties them together.

Sample application

We will use the following sample application to show how to setup a correlation id on your log lines. The application looks as followed:

  1. API Gateway that is invoking an AWS Lambda function
  2. A api Lambda function places the body in a SQS Queue
  3. A callback Lambda function is receiving messages from the SQS Queue
Architecture of the Sample Application

Every time that an API call executes, the apifunction is invoked. We can annotate the log lines by using the Lambda Powertools for Python.

from aws_lambda_powertools import Logger
from aws_lambda_powertools.logging import correlation_paths
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.data_classes import (
    event_source,
    APIGatewayProxyEvent,
)
logger = Logger(service="api")

@event_source(data_class=APIGatewayProxyEvent)
@logger.inject_lambda_context(correlation_id_path=correlation_paths.API_GATEWAY_REST)
def lambda_handler(event: APIGatewayProxyEvent, context: LambdaContext) -> dict:
    logger.info("This message now has a correlation_id set")

We take the requestContext.requestId from the incoming event. We can use that as a correlation_id. (This is also returned to the caller as a x-amzn-requestid header by default by API Gateway.)

To be able to search across log groups using the same correlation id. We need to pass it along with the message as an attribute when we send it to SQS.

    response = client.send_message(
        QueueUrl=os.environ.get("QUEUE_URL"),
        MessageBody=json.dumps(event.body),
        MessageAttributes={
            "correlation_id": {
                "DataType": "String",
                "StringValue": event.request_context.request_id,
            }
        },
    )

I used a correlation_id attribute to pass the correlation id with the SQS message.

The queue will trigger the callback function with a batch of messages. You need to loop through the messages in the batch. Process them and remove them from the queue.
Or you use the sqs_batch_processor from the powertools.

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.batch import sqs_batch_processor
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
logger = Logger(service="callback")

def record_handler(record: dict) -> None:
    record = SQSRecord(record)
    # Get the correlation id from the message attributes
    correlation_id = record.message_attributes["correlation_id"].string_value
    logger.set_correlation_id(correlation_id)
    logger.info(f"Processing message with {correlation_id} as correlation_id")

@sqs_batch_processor(record_handler=record_handler)
def lambda_handler(event: dict, context: LambdaContext) -> None:
    logger.set_correlation_id(None)
    logger.info(f"Received a SQSEvent with {len(list(event.records))} records")

The sqs_batch_processor decorator will call the record_handler method for each record received. No need for you to handle the deletion of messages.

In the end we do set the correlation id to None. We do this because the rest of the log lines are not related to the message received.

Show me!

If you call the API using a simple curl command, you can see the correlation_id (Note: this is also in the headers)

initial request using curl

So imagine that something went wrong. We know the correlation_id so we will use CloudWatch Logs Insights:

fields level, service, message
| filter correlation_id="263021ca-fb82-41ff-96ad-2e933aadbe0a"
| sort @timestamp desc
Sample query of AWS CloudWatch Logs Insights

As you can see there are 3 results 2 lines originated from the api service and 1 from the callback service. If have an interest in how to create queries you can read the CloudWatch Logs Insights query syntax.

Conclusion

The Lambda Powertools for Python make it easy
to add a correlation_id. By using the structured logs from the powertools you can use each property in the log line to
query on. It has a lot more utilities, so I encourage go check it out!

Not a python developer? There is a Java
version and there are some initiatives on the roadmap
to add other languages as well.

Joris has been working with the AWS cloud since 2009 and focussing on building event driven architectures. While working with the cloud from (almost) the start he has seen most of the services being launched. Joris strongly believes in automation and infrastructure as code and is open to learn new things and experiment with them, because that is the way to learn and grow. In his spare time he enjoys running and runs a small micro brewery from his home.
Share this article: Tweet this post / Post on LinkedIn