Yubl’s road to Serverless architecture — Testing and CI/CD

part 1: overview

part 2: test­ing and CI/CD <- you’re here

part 3: ops

part 4: build­ing a scal­able noti­fi­ca­tion sys­tem

part 5: build­ing a bet­ter rec­om­men­da­tion sys­tem

Hav­ing spo­ken to quite a few peo­ple about using AWS Lamb­da in pro­duc­tion, test­ing and CI/CD are always high up the list of ques­tions, so I’d like to use this post to dis­cuss the approach­es that we took at Yubl.

Please keep in mind that this is a rec­ol­lec­tion of what we did, and why we chose to do things that way. I have heard oth­ers advo­cate very dif­fer­ent approach­es, and I’m sure they too have their rea­sons and their approach­es no doubt work well for them. I hope to give you as much con­text (or, the “why”) as I can so you can judge whether or not our approach would like­ly work for you, and feel free to ask ques­tions in the com­ments sec­tion.



In Grow­ing Object-Ori­ent­ed Soft­ware, Guid­ed by Tests, Nat Pryce and Steve Free­man talked about the 3 lev­els of test­ing [Chap­ter 1]:

  1. Accep­tance — does the whole sys­tem work?
  2. Inte­gra­tion — does our code work against code we can’t change?
  3. Unit — do our objects do the right thing, are they easy to work with?

As you move up the lev­el (accep­tance -> unit) the speed of the feed­back loop becomes faster, but you also have less con­fi­dence that your sys­tem will work cor­rect­ly when deployed.

Favour Acceptance and Integration Tests

With the FAAS par­a­digm, there are more “code we can’t change” than ever (AWS even describes Lamb­da as the “glue for your cloud infra­struc­ture”) so the val­ue of inte­gra­tion and accep­tance tests are also high­er than ever. Also, as the “code we can’t change” are eas­i­ly acces­si­ble as ser­vice, it also makes these tests far eas­i­er to orches­trate and write than before.

The func­tions we tend to write were fair­ly sim­ple and didn’t have com­pli­cat­ed log­ic (most of the time), but there were a lot of them, and they were loose­ly con­nect­ed through mes­sag­ing sys­tems (Kine­sis, SNS, etc.) and APIs. The ROI for accep­tance and inte­gra­tion tests are there­fore far greater than unit tests.

It’s for these rea­son that we decid­ed (ear­ly on in our jour­ney) to focus our efforts on writ­ing accep­tance and inte­gra­tion tests, and only write unit tests where the inter­nal work­ings of a Lamb­da func­tion is suf­fi­cient­ly com­plex.

No Mocks

In Grow­ing Object-Ori­ent­ed Soft­ware, Guid­ed by TestsNat Pryce and Steve Free­man also talked about why you shouldn’t mock types that you can’t change [Chap­ter 8], because…

…We find that tests that mock exter­nal libraries often need to be com­plex to get the code into the right state for the func­tion­al­i­ty we need to exer­cise.

The mess in such tests is telling us that the design isn’t right but, instead of fix­ing the prob­lem by improv­ing the code, we have to car­ry the extra com­plex­i­ty in both code and test…

…The sec­ond risk is that we have to be sure that the behav­iour we stub or mock match­es what the exter­nal library will actu­al­ly do…

Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…

I believe the same prin­ci­ples apply here, and that you shouldn’t mock ser­vices that you can’t change.

Integration Tests

Lamb­da func­tion is ulti­mate­ly a piece of code that AWS invokes on your behalf when some input event occurs. To test that it inte­grates cor­rect­ly with down­stream sys­tems you can invoke the func­tion from your cho­sen test frame­work (we used Mocha).

Since the pur­pose is to test the inte­gra­tion points, so it’s impor­tant to con­fig­ure the func­tion to use the same down­stream sys­tems as the real, deployed code. If your func­tion needs to read from/write to a DynamoDB table then your inte­gra­tion test should be using the real table as opposed to some­thing like dynamodb-local.

It does mean that your tests can leave arte­facts in your inte­gra­tion envi­ron­ment and can cause prob­lems when run­ning mul­ti­ple tests in par­al­lel (eg. the arte­facts from one test affect results of oth­er tests). Which is why, as a rule-of-thumb, I advo­cate:

  • avoid hard-cod­ed IDs, they often cause unin­ten­tion­al cou­pling between tests
  • always clean up arte­facts at the end of each test

The same applies to accep­tance tests.

Acceptance Tests

(this pic­ture is slight­ly mis­lead­ing in that the Mocha tests are not invok­ing the Lamb­da func­tion pro­gram­mat­i­cal­ly, but rather invok­ing it indi­rect­ly via what­ev­er input event the Lamb­da func­tion is con­fig­ured with — API Gate­way, SNS, Kine­sis, etc. More on this lat­er.)

…Wher­ev­er pos­si­ble, an accep­tance test should exer­cise the sys­tem end-to-end with­out direct­ly call­ing its inter­nal code.

An end-to-end test inter­acts with the sys­tem only from the out­side: through its inter­face…

…We pre­fer to have the end-to-end tests exer­cise both the sys­tem and the process by which it’s built and deployed

This sounds like a lot of effort (it is), but has to be done any­way repeat­ed­ly dur­ing the software’s life­time…

- Grow­ing Object-Ori­ent­ed Soft­ware, Guid­ed by Tests [Chap­ter 1]

Once the inte­gra­tion tests com­plete suc­cess­ful­ly, we have good con­fi­dence that our code will work cor­rect­ly when it’s deployed. The code is deployed, and the accep­tance tests are run against the deployed sys­tem end-to-end.

Take our Search API for instance, one of the accep­tance cri­te­ria is “when a new user joins, he should be search­able by first name/last name/username”.

The accep­tance test first sets up the test con­di­tion — a new user joins — by inter­act­ing with the sys­tem from the out­side and call­ing the lega­cy API like the client app would. From here, a new-user-joined event will be fired into Kine­sis; a Lamb­da func­tion would process the event and add a new doc­u­ment in the User index in Cloud­Search; the test would val­i­date that the user is search­able via the Search API.

Avoid Brittle Tests

Because a new user is added to Cloud­Search asyn­chro­nous­ly via a back­ground process, it intro­duces even­tu­al con­sis­ten­cy to the sys­tem. This is a com­mon chal­lenge when you decou­ple fea­tures through events/messages. When test­ing these even­tu­al­ly con­sis­tent sys­tems, you should avoid wait­ing fixed time peri­ods (see pro­tip 5 below) as it makes your tests brit­tle.

In the “new user joins” test case, this means you shouldn’t write tests that:

  1. cre­ate new user
  2. wait 3 sec­onds
  3. val­i­date user is search­able

and instead, write some­thing along the lines of:

  1. cre­ate new user
  2. val­i­date user is search­able with retries
    1. if expec­ta­tion fails, then wait X sec­onds before retry­ing
    2. repeat
    3. allow Y retries before fail­ing the test case

Sharing test cases for Integration and Acceptance Testing

We also found that, most of the time the only dif­fer­ence between our inte­gra­tion and accep­tance tests is how our func­tion code is invoked. Instead of dupli­cat­ing a lot of code and effort, we used a sim­ple tech­nique to allow us to share the test cas­es.

Sup­pose you have a test case such as the one below.

The inter­est­ing bit is on line 22:

let res = yield when.we_invoke_get_all_keys(region);

In the when mod­ule, the func­tion we_invoke_get_all_keys will either

  • invoke the func­tion code direct­ly with a stubbed con­text object, or
  • per­form a HTTP GET request against the deployed API

depend­ing on the val­ue of process.env.TEST_MODE, which is an envi­ron­ment vari­able that is passed into the test via package.json (see below) or the bash script we use for deploy­ment (more on this short­ly).


Continuous Integration + Continuous Delivery

Whilst we had around 170 Lamb­da func­tions run­ning pro­duc­tion, many of them work togeth­er to pro­vide dif­fer­ent fea­tures to the app. Our approach was to group these func­tions such that:

  • func­tions that form the end­points of an API are grouped in a project
  • back­ground pro­cess­ing func­tions for a fea­ture are grouped in a project
  • each project has its own repo
  • func­tions in a project are test­ed and deployed togeth­er

The ratio­nale for this group­ing strat­e­gy is to:

  • achieve high cohe­sion for relat­ed func­tions
  • improve code shar­ing where it makes sense (end­points of an API are like­ly to share some log­ic since they oper­ate with­in the same domain)

Although func­tions are grouped into projects, they can still be deployed indi­vid­u­al­ly. We chose to deploy them as a unit because:

  • it’s sim­ple, and all relat­ed func­tions (in a project) have the same ver­sion no.
  • it’s dif­fi­cult to detect if a change to shared code will impact which func­tions
  • deploy­ment is fast, it makes lit­tle dif­fer­ence speed-wise whether we’re deploy one func­tion or five func­tions


For exam­ple, in the Yubl app, you have a feed of posts from peo­ple you fol­low (sim­i­lar to your Twit­ter time­line).

To imple­ment this fea­ture there was an API (with mul­ti­ple end­points) as well as a bunch of back­ground pro­cess­ing func­tions (con­nect­ed to Kine­sis streams and SNS top­ics).

The API has two end­points, but they also share a com­mon cus­tom auth func­tion, which is includ­ed as part of this project (and deployed togeth­er with the get and get-yubl func­tions).

The back­ground pro­cess­ing (ini­tial­ly only Kine­sis but lat­er expand­ed to include SNS as well, though the repo wasn’t renamed) func­tions have many shared code, such as the dis­trib­ute mod­ule you see below, as well as a num­ber of mod­ules in the lib fold­er.

All of these func­tions are deployed togeth­er as a unit.

Deployment Automation

We used the Server­less frame­work to do all of our deploy­ments, and it took care of pack­ag­ing, upload­ing and ver­sion­ing our Lamb­da func­tions and APIs. It’s super use­ful and took care of most of the prob­lem for us, but we still need­ed a thin lay­er around it to allow AWS pro­file to be passed in and to include test­ing as part of the deploy­ment process.

We could have script­ed these steps on the CI serv­er, but I have been burnt a few times by mag­ic scripts that only exist on the CI serv­er (and not in source con­trol). To that end, every project has a sim­ple build.sh script (like the one below) which gives you a com­mon vocab­u­lary to:

  • run unit/integration/acceptance tests
  • deploy your code

Our Jenk­ins build con­figs do very lit­tle and just invoke this script with dif­fer­ent params.

Continuous Delivery

To this day I’m still con­fused by Con­tin­u­ous “Deliv­ery” vs Con­tin­u­ous “Deploy­ment”. There seems to be sev­er­al inter­pre­ta­tions, but this is the one that I have heard the most often:

Regard­less of which def­i­n­i­tion is cor­rect, what was most impor­tant to us was the abil­i­ty to deploy our changes to pro­duc­tion quick­ly and fre­quent­ly.

Whilst there were no tech­ni­cal rea­sons why we couldn’t deploy to pro­duc­tion auto­mat­i­cal­ly, we didn’t do that because:

  • it gives QA team oppor­tu­ni­ty to do thor­ough tests using actu­al client apps
  • it gives the man­age­ment team a sense of con­trol over what is being released and when (I’m not say­ing if this is a good or bad thing, but mere­ly what we want­ed)

In our set­up, there were two AWS accounts:

  • pro­duc­tion
  • non-prod, which has 4 envi­ron­ments — dev, test, stag­ing, demo

(dev for devel­op­ment, test for QA team, stag­ing is a pro­duc­tion-like, and demo for pri­vate beta builds for investors, etc.)

In most cas­es, when a change is pushed to Bit­buck­et, all the Lamb­da func­tions in that project are auto­mat­i­cal­ly test­ed, deployed and pro­mot­ed all the way through to the stag­ing envi­ron­ment. The deploy­ment to pro­duc­tion is a man­u­al process that can hap­pen at our con­ve­nience and we gen­er­al­ly avoid deploy­ing to pro­duc­tion on Fri­day after­noon (for obvi­ous rea­sons ).



The approach­es we have talked about worked pret­ty well for our team, but it was not with­out draw­backs.

In terms of devel­op­ment flow, the focus on inte­gra­tion and accep­tance tests meant slow­er feed­back loops and the tests take longer to exe­cute. Also, because we don’t mock down­stream ser­vices it means we couldn’t run tests with­out inter­net con­nec­tion, which is an occa­sion­al annoy­ance when you want to work dur­ing com­mute.

These were explic­it trade­offs we made, and I stand by them even now and AFAIK every­one in the team feels the same way.


In terms of deploy­ment, I real­ly missed the abil­i­ty to do canary releas­es. Although this is off­set by the fact that our user base was still rel­a­tive­ly small and the speed with which one can deploy and roll­back changes with Lamb­da func­tions was suf­fi­cient to lim­it the impact of a bad change.

Whilst AWS Lamb­da and API Gate­way doesn’t sup­port canary releas­es out-of-the-box it is pos­si­ble to do a DIY solu­tion for APIs using weight­ed rout­ing in Route53. Essen­tial­ly you’ll have:

  • a canary stage for API Gate­way and asso­ci­at­ed Lamb­da func­tion
  • deploy pro­duc­tion builds to the canary stage first
  • use weight­ed rout­ing in Route53 to direct X% traf­fic to the canary stage
  • mon­i­tor met­rics, and when you’re hap­py with the canary build, pro­mote it to pro­duc­tion

Again, this would only work for APIs and not for back­ground pro­cess­ing (SNS, Kine­sis, S3, etc.).


So that’s it folks, hope you’ve enjoyed this post, feel free to leave a com­ment if you have any fol­low up ques­tions or tell me what else you’d like to hear about in part 3.




Liked this post? Why not support me on Patreon and help me get rid of the ads!