Yan Cui
I help clients go faster for less using serverless technologies.
This article is brought to you by
Step Functions, EventBridge, MSK, DynamoDB…stop hacking together AWS services and get back to building!
An interesting requirement came up at work this week where we discussed potentially having to run our own URL Shortener because the Universal Links mechanism (in iOS 9 and above) requires a JSON manifest at
https://domain.com/apple-app-site-association
Since the OS doesn’t follow redirects this manifest has to be hosted on the URL shortener’s root domain.
Owing to a limitation on AppsFlyer it’s currently not able to shorten links when you have Universal Links configured for your app. Whilst we can switch to another vendor it means more work for our (already stretched) client devs and we really like AppsFlyer‘s support for attributions.
Which brings us back to the question
“should we build a URL shortener?”
swiftly followed by
“how hard can it be to build a scalable URL shortener in 2017?”
Well, turns out it wasn’t hard at all
Lambda FTW
For this URL shortener we’ll need several things:
- a GET /{shortUrl} endpoint that will redirect you to the original URL
- a POST / endpoint that will accept an original URL and return the shortened URL
- an index.html page where someone can easily create short URLs
- a GET /apple-app-site-association endpoint that serves a static JSON response
all of which can be accomplished with API Gateway + Lambda.
Overall, this is the project structure I ended up with:
- using the Serverless framework’s aws-nodejs template
- each of the above endpoint have a corresponding handler function
- the index.html file is in the static folder
- the test cases are written in such a way that they can be used both as integration as well as acceptance tests
- there’s a build.sh script which facilitates running
- integration tests, eg ./build.sh int-test {env} {region} {aws_profile}
- acceptance tests, eg ./build.sh acceptance-test {env} {region} {aws_profile}
- deployment, eg ./build.sh deploy {env} {region} {aws_profile}
Get /apple-app-site-association endpoint
Seeing as this is a static JSON blob, it makes sense to precompute the HTTP response and return it every time.
POST / endpoint
For an algorithm to shorten URLs, you can find a very simple and elegant solution on StackOverflow. All you need is an auto-incremented ID, like the ones you normally get with RDBMS.
However, I find DynamoDB a more appropriate DB choice here because:
- it’s a managed service, so no infrastructure for me to worry about
- OPEX over CAPEX, man!
- I can scale reads & writes throughput elastically to match utilization level and handle any spikes in traffic
but, DynamoDB has no such concept as an auto-incremented ID which the algorithm needs. Instead, you can use an atomic counter to simulate an auto-incremented ID (at the expense of an extra write-unit per request).
GET /{shortUrl} endpoint
Once we have the mapping in a DynamoDB table, the redirect endpoint is a simple matter of fetching the original URL and returning it as part of the Location header.
Oh, and don’t forget to return the appropriate HTTP status code, in this case a 308 Permanent Redirect.
GET / index page
Finally, for the index page, we’ll need to return some HTML instead (and a different content-type to go with the HTML).
I decided to put the HTML file in a static folder, which is loaded and cached the first time the function is invoked.
Getting ready for production
Fortunately I have had plenty of practice getting Lambda functions to production readiness, and for this URL shortener we will need to:
- configure auto-scaling parameters for the DynamoDB table (which we have an internal system for managing the auto-scaling side of things)
- turn on caching in API Gateway for the production stage
Future Improvements
If you put in the same URL multiple times you’ll get back different short-urls, one optimization (for storage and caching) would be to return the same short-url instead.
To accomplish this, you can:
- add GSI to the DynamoDB table on the longUrl attribute to support efficient reverse lookup
- in the shortenUrl function, perform a GET with the GSI to find existing short url(s)
I think it’s better to add a GSI than to create a new table here because it avoids having “transactions” that span across multiple tables.
Useful Links
- Breaking down iOS 9 universal links
- slides for my talk on getting production ready with AWS Lambda
- my series on migrating Yubl to a serverless architecture
- SO : algorithm for shortening URLs
Whenever you’re ready, here are 3 ways I can help you:
- Production-Ready Serverless: Join 20+ AWS Heroes & Community Builders and 1000+ other students in levelling up your serverless game. This is your one-stop shop for quickly levelling up your serverless skills.
- I help clients launch product ideas, improve their development processes and upskill their teams. If you’d like to work together, then let’s get in touch.
- Join my community on Discord, ask questions, and join the discussion on all things AWS and Serverless.
Thanks for the article, do you have the code in GIT? Instead of using and auto-increment ID, instead I will use a epcho timestamp.
There are some proprietary info in our solution so we can’t share them publically. Using epoch is not safe concurrency-wise and requires you to do optimistic locking at the point of writing the mapping (and retrying the whole process), which is why we shied away from that. Also datetime values in both Nodejs and C# are granular to several milliseconds so if you’re creating shortlinks in bulk (and in parallel) then you have a higher likilyhood of running into that collision.
Muchas gracias por el post. Una información muy interesante.
Saludos! :)