Yubl’s road to Serverless architecture — Testing and CI/CD

part 1: overview

part 2: test­ing and CI/CD <- you’re here

part 3: ops

part 4: build­ing a scal­able noti­fi­ca­tion sys­tem

part 5: build­ing a bet­ter rec­om­men­da­tion sys­tem

Hav­ing spo­ken to quite a few peo­ple about using AWS Lamb­da in pro­duc­tion, test­ing and CI/CD are always high up the list of ques­tions, so I’d like to use this post to dis­cuss the approach­es that we took at Yubl.

Please keep in mind that this is a rec­ol­lec­tion of what we did, and why we chose to do things that way. I have heard oth­ers advo­cate very dif­fer­ent approach­es, and I’m sure they too have their rea­sons and their approach­es no doubt work well for them. I hope to give you as much con­text (or, the “why”) as I can so you can judge whether or not our approach would like­ly work for you, and feel free to ask ques­tions in the com­ments sec­tion.



In Grow­ing Object-Ori­ent­ed Soft­ware, Guid­ed by Tests, Nat Pryce and Steve Free­man talked about the 3 lev­els of test­ing [Chap­ter 1]:

  1. Accep­tance — does the whole sys­tem work?
  2. Inte­gra­tion — does our code work against code we can’t change?
  3. Unit — do our objects do the right thing, are they easy to work with?

As you move up the lev­el (accep­tance -> unit) the speed of the feed­back loop becomes faster, but you also have less con­fi­dence that your sys­tem will work cor­rect­ly when deployed.

Favour Acceptance and Integration Tests

With the FAAS par­a­digm, there are more “code we can’t change” than ever (AWS even describes Lamb­da as the “glue for your cloud infra­struc­ture”) so the val­ue of inte­gra­tion and accep­tance tests are also high­er than ever. Also, as the “code we can’t change” are eas­i­ly acces­si­ble as ser­vice, it also makes these tests far eas­i­er to orches­trate and write than before.

The func­tions we tend to write were fair­ly sim­ple and didn’t have com­pli­cat­ed log­ic (most of the time), but there were a lot of them, and they were loose­ly con­nect­ed through mes­sag­ing sys­tems (Kine­sis, SNS, etc.) and APIs. The ROI for accep­tance and inte­gra­tion tests are there­fore far greater than unit tests.

It’s for these rea­son that we decid­ed (ear­ly on in our jour­ney) to focus our efforts on writ­ing accep­tance and inte­gra­tion tests, and only write unit tests where the inter­nal work­ings of a Lamb­da func­tion is suf­fi­cient­ly com­plex.

No Mocks

In Grow­ing Object-Ori­ent­ed Soft­ware, Guid­ed by TestsNat Pryce and Steve Free­man also talked about why you shouldn’t mock types that you can’t change [Chap­ter 8], because…

…We find that tests that mock exter­nal libraries often need to be com­plex to get the code into the right state for the func­tion­al­i­ty we need to exer­cise.

The mess in such tests is telling us that the design isn’t right but, instead of fix­ing the prob­lem by improv­ing the code, we have to car­ry the extra com­plex­i­ty in both code and test…

…The sec­ond risk is that we have to be sure that the behav­iour we stub or mock match­es what the exter­nal library will actu­al­ly do…

Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…

I believe the same prin­ci­ples apply here, and that you shouldn’t mock ser­vices that you can’t change.

Integration Tests

Lamb­da func­tion is ulti­mate­ly a piece of code that AWS invokes on your behalf when some input event occurs. To test that it inte­grates cor­rect­ly with down­stream sys­tems you can invoke the func­tion from your cho­sen test frame­work (we used Mocha).

Since the pur­pose is to test the inte­gra­tion points, so it’s impor­tant to con­fig­ure the func­tion to use the same down­stream sys­tems as the real, deployed code. If your func­tion needs to read from/write to a DynamoDB table then your inte­gra­tion test should be using the real table as opposed to some­thing like dynamodb-local.

It does mean that your tests can leave arte­facts in your inte­gra­tion envi­ron­ment and can cause prob­lems when run­ning mul­ti­ple tests in par­al­lel (eg. the arte­facts from one test affect results of oth­er tests). Which is why, as a rule-of-thumb, I advo­cate:

  • avoid hard-cod­ed IDs, they often cause unin­ten­tion­al cou­pling between tests
  • always clean up arte­facts at the end of each test

The same applies to accep­tance tests.

Acceptance Tests

(this pic­ture is slight­ly mis­lead­ing in that the Mocha tests are not invok­ing the Lamb­da func­tion pro­gram­mat­i­cal­ly, but rather invok­ing it indi­rect­ly via what­ev­er input event the Lamb­da func­tion is con­fig­ured with — API Gate­way, SNS, Kine­sis, etc. More on this lat­er.)

…Wher­ev­er pos­si­ble, an accep­tance test should exer­cise the sys­tem end-to-end with­out direct­ly call­ing its inter­nal code.

An end-to-end test inter­acts with the sys­tem only from the out­side: through its inter­face…

…We pre­fer to have the end-to-end tests exer­cise both the sys­tem and the process by which it’s built and deployed

This sounds like a lot of effort (it is), but has to be done any­way repeat­ed­ly dur­ing the software’s life­time…

- Grow­ing Object-Ori­ent­ed Soft­ware, Guid­ed by Tests [Chap­ter 1]

Once the inte­gra­tion tests com­plete suc­cess­ful­ly, we have good con­fi­dence that our code will work cor­rect­ly when it’s deployed. The code is deployed, and the accep­tance tests are run against the deployed sys­tem end-to-end.

Take our Search API for instance, one of the accep­tance cri­te­ria is “when a new user joins, he should be search­able by first name/last name/username”.

The accep­tance test first sets up the test con­di­tion — a new user joins — by inter­act­ing with the sys­tem from the out­side and call­ing the lega­cy API like the client app would. From here, a new-user-joined event will be fired into Kine­sis; a Lamb­da func­tion would process the event and add a new doc­u­ment in the User index in Cloud­Search; the test would val­i­date that the user is search­able via the Search API.

Avoid Brittle Tests

Because a new user is added to Cloud­Search asyn­chro­nous­ly via a back­ground process, it intro­duces even­tu­al con­sis­ten­cy to the sys­tem. This is a com­mon chal­lenge when you decou­ple fea­tures through events/messages. When test­ing these even­tu­al­ly con­sis­tent sys­tems, you should avoid wait­ing fixed time peri­ods (see pro­tip 5 below) as it makes your tests brit­tle.

In the “new user joins” test case, this means you shouldn’t write tests that:

  1. cre­ate new user
  2. wait 3 sec­onds
  3. val­i­date user is search­able

and instead, write some­thing along the lines of:

  1. cre­ate new user
  2. val­i­date user is search­able with retries
    1. if expec­ta­tion fails, then wait X sec­onds before retry­ing
    2. repeat
    3. allow Y retries before fail­ing the test case

Sharing test cases for Integration and Acceptance Testing

We also found that, most of the time the only dif­fer­ence between our inte­gra­tion and accep­tance tests is how our func­tion code is invoked. Instead of dupli­cat­ing a lot of code and effort, we used a sim­ple tech­nique to allow us to share the test cas­es.

Sup­pose you have a test case such as the one below.

The inter­est­ing bit is on line 22:

let res = yield when.we_invoke_get_all_keys(region);

In the when mod­ule, the func­tion we_invoke_get_all_keys will either

  • invoke the func­tion code direct­ly with a stubbed con­text object, or
  • per­form a HTTP GET request against the deployed API

depend­ing on the val­ue of process.env.TEST_MODE, which is an envi­ron­ment vari­able that is passed into the test via package.json (see below) or the bash script we use for deploy­ment (more on this short­ly).


Continuous Integration + Continuous Delivery

Whilst we had around 170 Lamb­da func­tions run­ning pro­duc­tion, many of them work togeth­er to pro­vide dif­fer­ent fea­tures to the app. Our approach was to group these func­tions such that:

  • func­tions that form the end­points of an API are grouped in a project
  • back­ground pro­cess­ing func­tions for a fea­ture are grouped in a project
  • each project has its own repo
  • func­tions in a project are test­ed and deployed togeth­er

The ratio­nale for this group­ing strat­e­gy is to:

  • achieve high cohe­sion for relat­ed func­tions
  • improve code shar­ing where it makes sense (end­points of an API are like­ly to share some log­ic since they oper­ate with­in the same domain)

Although func­tions are grouped into projects, they can still be deployed indi­vid­u­al­ly. We chose to deploy them as a unit because:

  • it’s sim­ple, and all relat­ed func­tions (in a project) have the same ver­sion no.
  • it’s dif­fi­cult to detect if a change to shared code will impact which func­tions
  • deploy­ment is fast, it makes lit­tle dif­fer­ence speed-wise whether we’re deploy one func­tion or five func­tions


For exam­ple, in the Yubl app, you have a feed of posts from peo­ple you fol­low (sim­i­lar to your Twit­ter time­line).

To imple­ment this fea­ture there was an API (with mul­ti­ple end­points) as well as a bunch of back­ground pro­cess­ing func­tions (con­nect­ed to Kine­sis streams and SNS top­ics).

The API has two end­points, but they also share a com­mon cus­tom auth func­tion, which is includ­ed as part of this project (and deployed togeth­er with the get and get-yubl func­tions).

The back­ground pro­cess­ing (ini­tial­ly only Kine­sis but lat­er expand­ed to include SNS as well, though the repo wasn’t renamed) func­tions have many shared code, such as the dis­trib­ute mod­ule you see below, as well as a num­ber of mod­ules in the lib fold­er.

All of these func­tions are deployed togeth­er as a unit.

Deployment Automation

We used the Server­less frame­work to do all of our deploy­ments, and it took care of pack­ag­ing, upload­ing and ver­sion­ing our Lamb­da func­tions and APIs. It’s super use­ful and took care of most of the prob­lem for us, but we still need­ed a thin lay­er around it to allow AWS pro­file to be passed in and to include test­ing as part of the deploy­ment process.

We could have script­ed these steps on the CI serv­er, but I have been burnt a few times by mag­ic scripts that only exist on the CI serv­er (and not in source con­trol). To that end, every project has a sim­ple build.sh script (like the one below) which gives you a com­mon vocab­u­lary to:

  • run unit/integration/acceptance tests
  • deploy your code

Our Jenk­ins build con­figs do very lit­tle and just invoke this script with dif­fer­ent params.

Continuous Delivery

To this day I’m still con­fused by Con­tin­u­ous “Deliv­ery” vs Con­tin­u­ous “Deploy­ment”. There seems to be sev­er­al inter­pre­ta­tions, but this is the one that I have heard the most often:

Regard­less of which def­i­n­i­tion is cor­rect, what was most impor­tant to us was the abil­i­ty to deploy our changes to pro­duc­tion quick­ly and fre­quent­ly.

Whilst there were no tech­ni­cal rea­sons why we couldn’t deploy to pro­duc­tion auto­mat­i­cal­ly, we didn’t do that because:

  • it gives QA team oppor­tu­ni­ty to do thor­ough tests using actu­al client apps
  • it gives the man­age­ment team a sense of con­trol over what is being released and when (I’m not say­ing if this is a good or bad thing, but mere­ly what we want­ed)

In our set­up, there were two AWS accounts:

  • pro­duc­tion
  • non-prod, which has 4 envi­ron­ments — dev, test, stag­ing, demo

(dev for devel­op­ment, test for QA team, stag­ing is a pro­duc­tion-like, and demo for pri­vate beta builds for investors, etc.)

In most cas­es, when a change is pushed to Bit­buck­et, all the Lamb­da func­tions in that project are auto­mat­i­cal­ly test­ed, deployed and pro­mot­ed all the way through to the stag­ing envi­ron­ment. The deploy­ment to pro­duc­tion is a man­u­al process that can hap­pen at our con­ve­nience and we gen­er­al­ly avoid deploy­ing to pro­duc­tion on Fri­day after­noon (for obvi­ous rea­sons ).



The approach­es we have talked about worked pret­ty well for our team, but it was not with­out draw­backs.

In terms of devel­op­ment flow, the focus on inte­gra­tion and accep­tance tests meant slow­er feed­back loops and the tests take longer to exe­cute. Also, because we don’t mock down­stream ser­vices it means we couldn’t run tests with­out inter­net con­nec­tion, which is an occa­sion­al annoy­ance when you want to work dur­ing com­mute.

These were explic­it trade­offs we made, and I stand by them even now and AFAIK every­one in the team feels the same way.


In terms of deploy­ment, I real­ly missed the abil­i­ty to do canary releas­es. Although this is off­set by the fact that our user base was still rel­a­tive­ly small and the speed with which one can deploy and roll­back changes with Lamb­da func­tions was suf­fi­cient to lim­it the impact of a bad change.

Whilst AWS Lamb­da and API Gate­way doesn’t sup­port canary releas­es out-of-the-box it is pos­si­ble to do a DIY solu­tion for APIs using weight­ed rout­ing in Route53. Essen­tial­ly you’ll have:

  • a canary stage for API Gate­way and asso­ci­at­ed Lamb­da func­tion
  • deploy pro­duc­tion builds to the canary stage first
  • use weight­ed rout­ing in Route53 to direct X% traf­fic to the canary stage
  • mon­i­tor met­rics, and when you’re hap­py with the canary build, pro­mote it to pro­duc­tion

Again, this would only work for APIs and not for back­ground pro­cess­ing (SNS, Kine­sis, S3, etc.).


So that’s it folks, hope you’ve enjoyed this post, feel free to leave a com­ment if you have any fol­low up ques­tions or tell me what else you’d like to hear about in part 3.




Like what you’re read­ing? Check out my video course Pro­duc­tion-Ready Server­less and learn the essen­tials of how to run a server­less appli­ca­tion in pro­duc­tion.

We will cov­er top­ics includ­ing:

  • authen­ti­ca­tion & autho­riza­tion with API Gate­way & Cog­ni­to
  • test­ing & run­ning func­tions local­ly
  • CI/CD
  • log aggre­ga­tion
  • mon­i­tor­ing best prac­tices
  • dis­trib­uted trac­ing with X-Ray
  • track­ing cor­re­la­tion IDs
  • per­for­mance & cost opti­miza­tion
  • error han­dling
  • con­fig man­age­ment
  • canary deploy­ment
  • VPC
  • secu­ri­ty
  • lead­ing prac­tices for Lamb­da, Kine­sis, and API Gate­way

You can also get 40% off the face price with the code ytcui. Hur­ry though, this dis­count is only avail­able while we’re in Manning’s Ear­ly Access Pro­gram (MEAP).