Resize images on CloudFront using Lambda@Edge (Python/Pillow)

Amazon Web Services Python AWS Lambda Amazon CloudFront Amazon Simple Storage Service

In the previous article, we looked at how to reduce the size of images during upload to S3 using AWS Lambda and .NET. However, if you are using CloudFront to distribute images and want to have a more flexible approach, you can use Lambda@Edge to resize images on the fly. This allows you to reduce traffic between CloudFront and your server, as well as reduce page load times for your users.

Architecture

The idea behind our solution is to use Lambda@Edge on Origin Request event type to CloudFront. In this way, we can resize the image before it is sent to the user for the first time. The image will then be cached in CloudFront.

This time we will use Python and the Pillow library to resize the image. This is because Lambda@Edge only allows Python and NodeJs as Runtime. More about Lambda@Edge limitations.

Simplified architecture for image resizing using CloudFront and Lambda@Edge
Simplified architecture for image resizing using CloudFront and Lambda@Edge

Implementation

Image Resize Lambda@Edge

Due to the limitations of Lambda@Edge, which we mentioned earlier, we can only use Python or Node.js. In this case, we will use Python. Our function will be tied to the Origin Request event. This means that it will be called before CloudFront sends a request to S3 to get the image. At this point, we can:

  • Check if the image is already resized.
  • Redirect CloudFront to it if it is already resized.
  • Resize it first before redirecting if it is not yet resized.

Therefore, the image will be resized only once, and then saved in S3 and CloudFront cache. Potentially, our function will not be called very often. This will depend on the cache settings and the geography of the users.

import mimetypes
import os
import boto3
from io import BytesIO
from PIL import Image

ALLOWED_RESIZE_EXTENSIONS = ["jpg", "jpeg", "png", "gif", "bmp", "webp"]
SIZE_MAP = {
    'L': {'WIDE': (1280, 720), 'TALL': (720, 1280)},
    'M': {'WIDE': (1200, 628), 'TALL': (628, 1200)},
    'S': {'WIDE': (854, 480), 'TALL': (480, 854)},
    'XS': {'WIDE': (427, 240), 'TALL': (240, 427)},
}
SQUARE_SIZE_MAP = {'SL': (1080, 1080), 'SM': (540, 540), 'SS': (360, 360), 'SXS': (180, 180)}
SIZE_PARAMETER_NAME = "size="
SIZE_MAP.update(SQUARE_SIZE_MAP)
ALLOWED_RESIZE_VALUES = [SIZE_PARAMETER_NAME + size for size in {**SIZE_MAP, **SQUARE_SIZE_MAP}]
CONVERT_TO_WEBP = "to_webp=1"

s3 = boto3.client('s3')

def lambda_handler(event, context):
    request = event['Records'][0]['cf']['request']
    request_uri: str = request['uri'].lstrip('/')
    request_uri_no_extension: str = os.path.splitext(request_uri)[0]
    request_uri_extension: str = os.path.splitext(request_uri)[1].lstrip('.').lower()
    request_querystring: str = request['querystring']

    ## since environment variables are not supported in Lambda@Edge and we don't want to hardcode values
    ## we are passing them as custom headers in the S3 origin configuration
    ## alternatively, you can use AWS Systems Manager Parameter Store to store these values
    custom_headers = request['origin']['s3']['customHeaders']   
    env_resized_path = custom_headers['x-env-resized-path'][0]['value']
    env_bucket_name = custom_headers['x-env-bucket-name'][0]['value']
    env_quality = int(custom_headers['x-env-quality'][0]['value'])

    print(f"Resized path: {env_resized_path}, Bucket name: {env_bucket_name}, Quality: {env_quality}")

    if request_uri_extension not in ALLOWED_RESIZE_EXTENSIONS:
        print(f"Not allowed extension: {request_uri_extension}. Skipping...")
        return request

    query_params: list[str] = request_querystring.split('&')
    if not any(query in query_params for query in ALLOWED_RESIZE_VALUES):
         print("No allowed size parameter found. Skipping...")
         return request

    size_parameter: str = next((query for query in query_params if query.startswith(SIZE_PARAMETER_NAME)), "").replace(SIZE_PARAMETER_NAME, "")
    convert_to_webp: bool = any(query in query_params for query in [CONVERT_TO_WEBP])
    dest_file_extension = "webp" if convert_to_webp else request_uri_extension
    dest_file_path: str = f"{env_resized_path}/{size_parameter}/{request_uri_no_extension}.{dest_file_extension}"
    dest_image_exists: bool = is_s3_obj_exists(env_bucket_name, dest_file_path)

    if dest_image_exists:
        request['uri'] = '/' + dest_file_path
        print(f"Resized file already exists: {dest_file_path}. Returning...")
        return request

    source_image_exists: bool = is_s3_obj_exists(env_bucket_name, request_uri)
    if not source_image_exists:
        print(f"Source image does not exist: {request_uri}. Skipping...")
        return request

    # download original image from S3 and resize
    source_image_obj = s3.get_object(Bucket=env_bucket_name, Key=request_uri)['Body'].read()
    image: Image.Image = Image.open(BytesIO(source_image_obj))

    # resize image and save it to in-memory file
    image.thumbnail(get_size(image, size_parameter))
    in_mem_file = BytesIO()
    image.save(in_mem_file, format="webp" if convert_to_webp else image.format, quality=env_quality)
    in_mem_file.seek(0)

    # upload resized image to S3
    s3.upload_fileobj(
        in_mem_file,
        env_bucket_name,
        dest_file_path,
        ExtraArgs={
            # set proper content-type instead of default 'binary/octet-stream'
            'ContentType': mimetypes.guess_type(dest_file_path)[0] or f'image/{dest_file_extension}'
        }
    )

    # change request uri to the resized file path and return it to CloudFront
    request['uri'] = '/' + dest_file_path
    return request

def get_size(image, size_parameter):
    if size_parameter in SQUARE_SIZE_MAP:
        return SQUARE_SIZE_MAP[size_parameter]
    elif image.width > image.height:
        return SIZE_MAP[size_parameter]['WIDE']
    else:
        return SIZE_MAP[size_parameter]['TALL']

def is_s3_obj_exists(bucket, key: str) -> bool:
    objs = s3.list_objects_v2(Bucket=bucket, Prefix=key).get('Contents', [])
    return any(obj['Key'] == key for obj in objs)

Parameters

Since Lambda@Edge does not support Environment variables, the parameters are passed as custom headers values as part of the request from CloudFront. The following parameters are supported:

  • x-env-bucket-name - the name of the S3 bucket where the original and resized images are stored.
  • x-env-resized-path - the folder where the resized images are stored.
  • x-env-quality - the quality of the resized image.

Usage

Once someone requests an image from CloudFront, the Lambda@Edge function will be triggered. The function will check if the requested resized image is already exist. If not, it will resize the image and store it in the S3 bucket. The resized image will be served to the user.

Request an image from CloudFront like this:

https://xxxxxxxxxxx.cloudfront.net/kitty.jpg?size=<SIZE>&to_webp=<1|0>

Where:

  • size - the size of the resized image. The allowed values are predefine and hardcoded for the sake of simplicity. You can change them in the lambda_function.py file. The allowed values are:

    • XS, S, M, L for ’long’ and ’tall’ images
    • SXS, SS, SM, SL for ‘square’ images
  • to_webp - if set to 1, the image will be converted to WebP format. Otherwise, the image format will be the same as the original image.

CDK

Prerequisites

  • AWS CDK installed and configured.
  • Docker installed and running. This is required to correctly build the Lambda function.

Stack

To deploy our solution, we will utilize AWS CDK. We will describe a Stack that will create an S3 bucket, CloudFront Distribution, and Lambda@Edge function with all necessary roles, triggers, and permissions.

ℹ️
CDK’s S3Origin does not support recommended OriginAccessControl configuration but uses legacy OriginAccessIdentity under the hood instead. To overcome this limitation, we will use a workaround by extending the S3Origin construct and overriding the RenderS3OriginConfig method.
    public class ImageResizeEdgeCdkStack : Stack
    {
        internal ImageResizeEdgeCdkStack(Construct scope, string id, IStackProps props = null) : base(scope, id, props)
        {
            var bucketName = new CfnParameter(this, "BucketName", new CfnParameterProps
            {
                Type = "String",
                Description = "The name of the S3 bucket to store images",
                Default = "fastfoodcoding-imageprocessing"
            });

            var destinationFolder = new CfnParameter(this, "DestinationFolder", new CfnParameterProps
            {
                Type = "String",
                Description = "The name of the folder in the S3 bucket to store resized images",
                Default = "resize"
            });

            var quality = new CfnParameter(this, "Quality", new CfnParameterProps
            {
                Type = "Number",
                Description = "The quality of the resized images",
                Default = 80
            });

            // define a public S3 bucket to store images
            var s3Bucket = new Bucket(this, "ImageBucket", new BucketProps
            {
                BucketName = bucketName.ValueAsString,
                RemovalPolicy = RemovalPolicy.DESTROY,
                BlockPublicAccess = BlockPublicAccess.BLOCK_ALL,
            });

            // allow the lambda function to read and write objects to the bucket
            var imageResizeLambdaRole = new Role(this, "ImageResizeLambdaRole", new RoleProps
            {
                AssumedBy = new ServicePrincipal("lambda.amazonaws.com"),
                ManagedPolicies =
                [
                    ManagedPolicy.FromAwsManagedPolicyName("service-role/AWSLambdaBasicExecutionRole")
                ],
                InlinePolicies = new Dictionary<string, PolicyDocument>
                {
                    ["S3Policy"] = new PolicyDocument(new PolicyDocumentProps
                    {
                        Statements =
                        [
                            new PolicyStatement(new PolicyStatementProps
                            {
                                Actions = ["s3:GetObject", "s3:PutObject", "s3:Listbucket"],
                                Resources = [s3Bucket.ArnForObjects("*"), s3Bucket.BucketArn]
                            })
                        ]
                    })
                }
            });

            // define a lambda function to resize images
            var imageResizeLambda = new Amazon.CDK.AWS.Lambda.Function(this, "ImageResizeLambda", new Amazon.CDK.AWS.Lambda.FunctionProps
            {
                Runtime = Runtime.PYTHON_3_11,
                Handler = "lambda_function.lambda_handler",
                // build our function with Docker since Pillow has platform-specific dependencies. In our case, we're building for linux/amd64
                Code = Code.FromDockerBuild("../src", new DockerBuildAssetOptions { Platform = "linux/amd64" }),
                Architecture = Architecture.X86_64,
                MemorySize = 512,
                Timeout = Duration.Seconds(30),
                Role = imageResizeLambdaRole
            });

            // define L1 construct to attach the OriginAccessControl to the CloudFront Distribution
            var cfnOriginAccessControl = new CfnOriginAccessControl(this, "OriginAccessControl", new CfnOriginAccessControlProps
            {
                OriginAccessControlConfig = new OriginAccessControlConfigProperty
                {
                    Name = "ImageResize-OriginAccessControl",
                    OriginAccessControlOriginType = "s3",
                    SigningBehavior = "always",
                    SigningProtocol = "sigv4"
                }
            });

            // define CloudFront distribution to serve images from the bucket
            var cfnDistribution = new Distribution(this, "ImageBucketDistribution", new DistributionProps
            {
                DefaultBehavior = new BehaviorOptions
                {
                    Origin = new S3OacOrigin(s3Bucket, new S3OriginProps
                    {
                        OriginAccessIdentity = null,
                        ConnectionAttempts = 3,
                        ConnectionTimeout = Duration.Seconds(10),
                        // since Lambda@Edge doesn't support Environment variables, we're passing those as custom headers
                        // please note, that this make sense only for the OriginRequest event type and not for the ViewerRequest
                        CustomHeaders = new Dictionary<string, string>
                        {
                            ["X-Env-Resized-Path"] = destinationFolder.ValueAsString,
                            ["X-Env-Bucket-Name"] = bucketName.ValueAsString,
                            ["X-Env-Quality"] = quality.ValueAsString
                        }
                    }),
                    CachePolicy = new CachePolicy(this, "ImageBucketCachePolicy", new CachePolicyProps
                    {
                        QueryStringBehavior = CacheQueryStringBehavior.AllowList("size", "to_webp"),
                        DefaultTtl = Duration.Days(1),
                        MaxTtl = Duration.Days(365),
                        MinTtl = Duration.Seconds(1),
                        EnableAcceptEncodingGzip = true
                    }),
                    ViewerProtocolPolicy = ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
                    EdgeLambdas = new[]
                    {
                        new EdgeLambda
                        {
                            EventType = LambdaEdgeEventType.ORIGIN_REQUEST,
                            FunctionVersion = imageResizeLambda.CurrentVersion
                        }
                    }
                }
            });

            s3Bucket.AddToResourcePolicy(new PolicyStatement(new PolicyStatementProps
            {
                Actions = ["s3:GetObject"],
                Principals = [new ServicePrincipal("cloudfront.amazonaws.com")],
                Effect = Effect.ALLOW,
                Resources = [s3Bucket.ArnForObjects("*")],
                Conditions = new Dictionary<string, object>
                {
                    ["StringEquals"] = new Dictionary<string, object>
                    {
                        ["AWS:SourceArn"] = $"arn:aws:cloudfront::{this.Account}:distribution/{cfnDistribution.DistributionId}"
                    }
                }
            }));

            // workaround using the L1 construct to attach the OriginAccessControl to the CloudFront Distribution
            var l1CfnDistribution = cfnDistribution.Node.DefaultChild as CfnDistribution;
            l1CfnDistribution.AddPropertyOverride("DistributionConfig.Origins.0.OriginAccessControlId", cfnOriginAccessControl.AttrId);
        }
    }

    public class S3OacOrigin : OriginBase
    {
        public S3OacOrigin(IBucket bucket, IOriginProps props = null) : base(bucket.BucketRegionalDomainName, props) { }

        // workaround to avoid the "OriginAccessIdentity" property to be rendered in the CloudFormation template
        protected override IS3OriginConfigProperty RenderS3OriginConfig()
        {
            return new S3OriginConfigProperty
            {
                OriginAccessIdentity = ""
            };
        }
    }

How to deploy

cdk deploy --parameters BucketName=fastfoodcoding-edge --parameters DestinationFolder=resized --parameters Quality=80

Conclusion

In this article, we’ve learned how to resize images on the fly using CloudFront and Lambda@Edge. This approach allows us to reduce the traffic between CloudFront and our server, as well as decrease the page load time for our users. We’ve also faced some limitations of Lambda@Edge and CDK, but we’ve managed to overcome them. Feel free to experiment with the code and adapt it to your needs. Tasty coding!