🔥Let’s Do DevOps: CloudFront Lambda@Edge to Add CSP HTTP Headers

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can…

Aug 30, 2022

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!

Hey all!

S3 is great for storing files, and CloudFront is great for global content delivery caching and speed. You can set up an incredible website with these two resources. However, your security team will soon ask you to add some Content Security Policy (CSP) HTTP headers to the traffic, and you’ll need to tell them that CloudFront doesn’t support that yet.

Note: CloudFront DOES now support CSP headers! And even arbitrary headers, which is awesome and easier than this method. You can use this article for other lambda@edge use cases, and the new article for how to do CSP headers at CloudFront layer.

Lambda@Edge AWS page - https://aws.amazon.com/lambda/edge/

Lambda@Edge integrates with CloudFront and can manipulate traffic in flight in a variety of ways, which can do some incredible things. In this blog we’ll shim in some HTTP headers from source S3 → CloudFront on the back-side of our CloudFront cache. This means the Lambda will only run when the cache is cleared and the S3 origin content is fetched (read: not very often), but we still get CSP header security.

Let’s do it!

Code will be boxed in and discussed inline, and all code will be runnable and linked to github repos at the end of this article.

Definitions

Before we dive into theory, let’s do some definitions to get us started.

S3 Bucket — AWS service to store files. Integrated with by many services
Lambda — AWS serverless service, able to run code when triggered. Also integrated with by many services
CloudFront — AWS Content Delivery Network (CDN), caches and distributes files to many points across the Earth for quick access from almost anywhere
CloudFront Origin — The Origin is where CloudFront is pulling files from. It needs an origin so it can cache the content and distribute it.
Caching — Caching is the concept of polling data from somewhere and storing it. CloudFront does this by pulling data from your Origin (probably S3) and storing it within the CloudFront nodes all over the world to decrease user latency when accessing your stuff!
IAM — Identity and Access Management (IAM) policies control what each principal (user, role, AWS service) is permitted to do

Lambda@Edge Theory

Lambdas are a method to run code without providing any servers. They’re great for very quick atomic operations. The Lambda service supports any number of triggers which permit us to run our lambda and have it do stuff. That stuff is up to you, since it’s your code that’s running!

Lambda@Edge is the concept of integrating Lambda deeply into CloudFront’s cache. It permits Lambdas to be run in 4 different specific circumstances:

Viewer Request — The public side request coming into Cloudfront is called the “viewer”, and we can modify the request before it’s processed by CloudFront
Viewer Response — Modify the data sent back to the viewer after the request is processed by CloudFront
Origin Request — Modify the cache refresh message sent to the Origin, which is usually an S3 bucket
Origin Response — Modify the data sent from the Origin

https://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html

Now, in our use case, we want to modify the data provided to our users, which means we can either modify Origin response or Viewer response. Either would work perfectly, right? We can shim headers in at either place and the user would see them.

If we use Viewer response, the lambda would run each time the CloudFront is accessed, which could be… a lot! And that might be expensive, and would add a little latency to the end user’s request to us.

If we use Origin response, the lambda would run only each time CloudFront dumped its cache and renewed the data from our Origin. That’s pretty rarely, and has the same benefits — the headers would be cached by CloudFront, so same outcome. And this one saves us $$! Let’s use this one.

Hosting a Website in S3

First of all, let’s build our S3 bucket and put our website on it. Our site will be a single index.html page for this demo. In the real world, sites are tested and deployed via content management tools because there many be thousands or millions of pages to host.

We create a locals file with our S3 bucket name because we’ll be referencing it a few times, and then we create our S3 bucket. In previous iterations of the AWS terraform provider, the S3 bucket resource was a monolith and had a ton of configuration in it. In the new versions, each configuration block for S3 is a separate resource. We’ll go over that shortly.

We also set our S3 bucket to use the private ACL, which means access isn’t permitted except where our policy permits it.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	locals {
	s3_bucket_name = "kyler-s3-bucket-asdfas"
	}

	resource "aws_s3_bucket" "s3_bucket" {
	bucket = local.s3_bucket_name

	tags = {
	Name = local.s3_bucket_name
	}
	}

	resource "aws_s3_bucket_acl" "s3_bucket_acl" {
	bucket = aws_s3_bucket.s3_bucket.id
	acl = "private"
	}

view raw lambdaAtEdge-S3bucket.tf hosted with ❤ by GitHub

Next we create an S3 bucket policy to permit only cloudfront and assign it to our bucket. Our goal is that folks can’t bypass our security layer (CloudFront) and access our files directly in the S3 bucket.

Show hidden characters

	resource "aws_s3_bucket_policy" "s3_bucket_policy" {
	bucket = aws_s3_bucket.s3_bucket.id
	policy = data.aws_iam_policy_document.cloudfront_access.json
	}

	data "aws_iam_policy_document" "cloudfront_access" {
	statement {
	sid = "PermitCloudfrontOnly"

	principals {
	type = "AWS"
	identifiers = [
	aws_cloudfront_origin_access_identity.origin_access_identity.iam_arn
	]
	}

	actions = [
	"s3:GetObject",
	]

	resources = [
	"arn:aws:s3:::${local.s3_bucket_name}/*",
	]
	}
	}

view raw lambdaAtEdge-S3bucketPolicy.tf hosted with ❤ by GitHub

We create a flat index.html page to represent our website.

Show hidden characters

	<HTML>
	<HEAD>
	<TITLE>Website for testing CloudFront Lambda</TITLE>
	</HEAD>

	<BODY BGCOLOR="FFFFFF">
	<HR>

	<H1>Congratulations!</H1>
	Your website appears to be working!! <BR>

	<HR>
	</BODY>
	</HTML>

view raw lambdaAtEdge-S3bucketSite.html hosted with ❤ by GitHub

We have Terraform grab that index.html file and upload it to the S3 bucket. The etag attribute acts as a trigger, and will cause the file to be re-uploaded to the bucket if its md5 hash changes in any way.

Show hidden characters

	resource "aws_s3_object" "website" {
	bucket = aws_s3_bucket.s3_bucket.id
	key = "index.html"
	acl = "private"
	source = "index.html"
	etag = filemd5("index.html")
	content_type = "text/html"
	}

view raw lambdaAtEdge-S3bucketSiteObject.tf hosted with ❤ by GitHub

We tell the S3 bucket to act as a web server:

Show hidden characters

	resource "aws_s3_bucket_website_configuration" "web_hosting" {
	bucket = aws_s3_bucket.s3_bucket.id

	index_document {
	suffix = "index.html"
	}
	}

view raw lambdaAtEdge-S3bucketWebHosting.tf hosted with ❤ by GitHub

And then we tell terraform to invalidate our CloudFront cache each time this file is updated. That means if we modify our index.html file, then run terraform, it’ll:

Update the S3 bucket with our new file
Tell CloudFront to invalidate it’s cache, which triggers CloudFront to fetch the new version and start serving it. All in < 30 seconds!

Show hidden characters

	resource "null_resource" "invalidate_cf_cache" {
	provisioner "local-exec" {
	command = "aws cloudfront create-invalidation --distribution-id ${aws_cloudfront_distribution.s3_distribution.id} --paths '/*'"
	}
	triggers = {
	website_version_changed = aws_s3_object.website.version_id
	}
	}

view raw lambdaAtEdge-S3bucketCfInvalidate.tf hosted with ❤ by GitHub

Building a Lambda To Modify HTTP Headers

Next we need to write a lambda to shim in HTTP headers. We’ll use NodeJS. Let’s start with our lambda.js file. See how each header from line 9–15 is a separate line, and can be updated to enforce HTTP security policy. Save this in the same directory where your TF lives.

Show hidden characters

	'use strict';
	exports.handler = (event, context, callback) => {

	//Get contents of response
	const response = event.Records[0].cf.response;
	const headers = response.headers;

	//Set new headers
	headers['strict-transport-security'] = [{key: 'Strict-Transport-Security', value: 'max-age=63072000; includeSubdomains; preload'}];
	headers['content-security-policy'] = [{key: 'Content-Security-Policy', value: "default-src 'self' .domain1.com .domain2.com; font-src 'self' fonts.gstatic.com; frame-src .domain1.com .domain2.net; img-src 'self' data: .domain1.com .domain2.com; object-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval' .domain1.com .domain2.com; style-src 'self' 'unsafe-inline' .domain1.com .domain2.com;"}];
	headers['x-content-type-options'] = [{key: 'X-Content-Type-Options', value: 'nosniff'}];
	headers['x-frame-options'] = [{key: 'X-Frame-Options', value: 'DENY'}];
	headers['x-xss-protection'] = [{key: 'X-XSS-Protection', value: '1; mode=block'}];
	headers['referrer-policy'] = [{key: 'Referrer-Policy', value: 'same-origin'}];
	headers['Cache-Control'] = [{key: 'Cache-Control', value: 'max-age=31536000'}]

	//Return modified response
	callback(null, response);
	};

view raw lambdaAtEdge-Lambda.js hosted with ❤ by GitHub

Lambda requires our code to be zipped up, so first let’s zip up our lambda.js file into a .zip file on line 2.

Then on line 8 we create our lambda. There are a few interesting things here. Line 9 must match the name of our zip file. Line 11 is our Lambda IAM role, which we’ll cover next.

Line 12 is our handler, which was a new concept to me before this project. It’s basically our code ingress point, in case your NodeJS code contains multiple functions you might want to start with. If you don’t have a function name (like we don’t), the handler should be (file_name).handler. So in our case, lambda.handler.

Line 15, publish, must be true. CloudFront can’t access Lambda versions unless they are published. Line 18 is checking the base64sha256 hash of our file to see if it’s changed, and if yes, it’ll publish a new version of the lambda. And line 20 is our runtime, the specific version of NodeJS we want to run our code.

Show hidden characters

	# Zip up lambda when it changes
	data "archive_file" "lambda" {
	type = "zip"
	source_file = "lambda.js"
	output_path = "lambda.zip"
	}

	resource "aws_lambda_function" "lambda_function" {
	filename = "lambda.zip"
	function_name = "HttpHeaderShim"
	role = aws_iam_role.lambda_role.arn
	handler = "lambda.handler"

	# Publishes new versions on source update, required for Lambda@Edge with CloudFront
	publish = true

	# Hash the source file, if it changes, lambda will be updated
	source_code_hash = data.archive_file.lambda.output_base64sha256

	runtime = "nodejs12.x"
	}

view raw lambdaAtEdge-LambdaFunction.tf hosted with ❤ by GitHub

We want CloudFront to be able to trigger our Lambda when it calls it, and the service Amazon named it is edgelambda, so we add that to the AWS Services which are able to assume this role and do things.

Show hidden characters

	resource "aws_iam_role" "lambda_role" {
	name = "HttpHeaderShimLambdaRole"
	assume_role_policy = jsonencode({
	Version = "2012-10-17"
	Statement = [
	{
	Action = "sts:AssumeRole"
	Effect = "Allow"
	Sid = ""
	Principal = {
	Service = [
	"lambda.amazonaws.com",
	"edgelambda.amazonaws.com",
	]
	}
	}
	]
	})
	}

view raw lambdaAtEdge-LambdaFunctionIamRole.tf hosted with ❤ by GitHub

Then we create our permissions policy for lambda, which permits our lambda to write logs when its’ run, which is great for troubleshooting. Then we link the two together.

Show hidden characters

	resource "aws_iam_policy" "lambda_policy" {
	name = "HttpHeaderShimLambdaPolicy"
	description = "Http Header Shim Lambda Policy"

	policy = jsonencode({
	"Version" : "2012-10-17",
	"Statement" : [
	{
	"Action" : [
	"cloudwatch:PutMetricData",
	"logs:CreateLogGroup",
	"logs:CreateLogStream",
	"logs:PutLogEvents"
	],
	"Resource" : [
	"*"
	],
	"Effect" : "Allow",
	"Sid" : "CloudWatchPutMetricData"
	}
	]
	})
	}

	resource "aws_iam_role_policy_attachment" "lambda-policy" {
	role = aws_iam_role.lambda_role.name
	policy_arn = aws_iam_policy.lambda_policy.arn
	}

view raw lambdaAtEdge-LambdaPolicyCloudWatch.js hosted with ❤ by GitHub

CloudFront with Lambda@Edge and Caching

Now for the good stuff, let’s build some CloudFront, tell it to source all files from our S3 bucket, and integrate it with our fancy lambda.

Origin ID is an local mapping name for where we’ll grab our files from. We’ll reference it a few times, so we set up a local name for it.

Then we build a Cloudfront Access Identity — you’ll see this called an OAI all over AWS. It does nothing as far as I can tell but we have to build it ¯\_(ツ)_/¯

Show hidden characters

	locals {
	s3_origin_id = "S3OriginId"
	}

	resource "aws_cloudfront_origin_access_identity" "origin_access_identity" {
	comment = "CF Origin Access Identity"
	}

view raw lambdaAtEdge-CFOriginLocal.js hosted with ❤ by GitHub

There’s a lot going on here, so let’s walk through it piece by piece.

Line 1 — we use a data call to find the CachingOptimized cache policy ID. We’ll use that attribute in our policy, and it gets a dynamic name per account.

Line 6 — We set the origin for our CloudFront to our S3 bucket.

Line 18 — The default root object of our CloudFront is the index.html, which means if anyone goes directly to our cloudfront distribution they’ll be served the index.html page.

Line 20 — Defines the default cache behavior, which is to cache GET, HEAD (line 22), pull files from the S3 origin (line 23), compress everything for storage in CloudFront (line 24), use our caching policy we looked up on line 1 (line 25).

Line 32 is where we associate our lambda and call out “origin-response” lambda@Edge type.

Line 47 says to use a default https cert, which is perfect for a lab (but probably not what you want in your enterprise!).

Show hidden characters

	data "aws_cloudfront_cache_policy" "caching_optimized" {
	name = "Managed-CachingOptimized"
	}

	resource "aws_cloudfront_distribution" "s3_distribution" {
	origin {
	domain_name = aws_s3_bucket.s3_bucket.bucket_regional_domain_name
	origin_id = local.s3_origin_id

	s3_origin_config {
	origin_access_identity = aws_cloudfront_origin_access_identity.origin_access_identity.cloudfront_access_identity_path
	}
	}

	enabled = true
	is_ipv6_enabled = false
	comment = "S3 CloudFront"
	default_root_object = "index.html"

	default_cache_behavior {
	allowed_methods = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
	cached_methods = ["GET", "HEAD"]
	target_origin_id = local.s3_origin_id
	compress = true
	cache_policy_id = data.aws_cloudfront_cache_policy.caching_optimized.id

	viewer_protocol_policy = "allow-all"

	# For default responses (not http-->https redirect), trigger lambda on response from origin
	# Could also modify on viewer response, but origin-response will cache headers and limit lambda run volume
	lambda_function_association {
	event_type = "origin-response"
	lambda_arn = aws_lambda_function.lambda_function.qualified_arn
	}
	}

	price_class = "PriceClass_200"

	restrictions {
	geo_restriction {
	restriction_type = "whitelist"
	locations = ["US"]
	}
	}

	viewer_certificate {
	cloudfront_default_certificate = true
	}
	}

view raw lambdaAtEdge-CFDist.tf hosted with ❤ by GitHub

And then we output the cloudfront distribution’s name along with some http stuff so it’s a clickable link.

Show hidden characters

	output "site_can_be_reached_at" {
	value = "https://${aws_cloudfront_distribution.s3_distribution.domain_name}/"
	}

view raw lambdaAtEdge-CFDistOutput.tf hosted with ❤ by GitHub

Test It Out!

After we run terraform and it builds all the things, we can curl to the cloudfront distribution to test if the new headers exist, and boom, there we are! Line 3–8 are our headers, injected by NodeJS and cached in CloudFront before being sent to our client. Amazing.

Show hidden characters

	curl -vvv https://xxxxxxxx.cloudfront.net/ 2>&1
	(removed)
	< strict-transport-security: max-age=63072000; includeSubdomains; preload
	< content-security-policy: default-src 'self' .domain1.com .domain2.com; font-src 'self' fonts.gstatic.com; frame-src .domain1.com .domain2.net; img-src 'self' data: .domain1.com .domain2.com; object-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval' .domain1.com .domain2.com; style-src 'self' 'unsafe-inline' .domain1.com .domain2.com;
	< x-content-type-options: nosniff
	< x-frame-options: DENY
	< x-xss-protection: 1; mode=block
	< referrer-policy: same-origin
	(removed)
	<HTML>
	<HEAD>
	<TITLE>Website for testing CloudFront Lambda</TITLE>
	</HEAD>

	<BODY BGCOLOR="FFFFFF">
	<HR>

	<H1>Congratulations!</H1>
	Your website appears to be working!! <BR>

	<HR>
	</BODY>
	</HTML>

view raw lambdaAtEdge-CurlWithHeaders hosted with ❤ by GitHub

Summary

We covered Lambda@Edge architecture principals, then built an S3 bucket with a policy to permit only CloudFront, a lambda in NodeJS to shim CORS HTTP headers in, and a CloudFront distribution, then linked CloudFront to the Lambda using Lambda@Edge.

Then we used curl to confirm that all our headers are shimmed in and cached. And we’re off to the races!

In an upcoming article I’ll deploy the new CloudFront header shim configuration that no longer utilizes lambda. Until then,

Good luck out there!
kyler

Let's Do DevOps

Discussion about this post