AWS Lambda — should you have few monolithic functions or many single-purposed functions?

A fun­ny moment (at 38:50) hap­pened dur­ing Tim Bray’s ses­sion (SRV306) at re:invent 2017, when he asked the audi­ence if we should have many sim­ple, sin­gle-pur­posed func­tions, or few­er mono­lith­ic func­tions, and the room was pret­ty much split in half.

Hav­ing been brought up on the SOLID prin­ci­ples, and espe­cial­ly the sin­gle respon­si­bil­i­ty prin­ci­ple (SRP), this was a moment that chal­lenged my belief that fol­low­ing the SRP in the server­less world is a no-brain­er.

That prompt­ed this clos­er exam­i­na­tion of the argu­ments from both sides.

Full dis­clo­sure, I am biased in this debate. If you find flaws in my think­ing, or sim­ply dis­agree with my views, please point them out in the com­ments.

What is a monolithic function?

By “mono­lith­ic func­tions”, I meant func­tions that have inter­nal branch­ing log­ic based on the invo­ca­tion event and can do one of sev­er­al things.

For exam­ple, you can have one func­tion han­dle sev­er­al HTTP end­points and meth­ods and per­form a dif­fer­ent actions based on path and method.

module.exports.handler = (event, context, cb) => {
  const path = event.path;
  const method = event.httpMethod;
  if (path === '/user' && method === 'GET') {
    .. // get user
  } else if (path === '/user' && method === 'DELETE') {
    .. // delete user
  } else if (path === '/user' && method === 'POST') {
    .. // create user
  } else if ... // other endpoints & methods
}

What is the real problem?

One can’t ratio­nal­ly rea­son about and com­pare solu­tions with­out first under­stand­ing the prob­lem and what qual­i­ties are most desired in a solu­tion.

And when I hear com­plaints such as:

hav­ing so many func­tions is hard to man­age

I imme­di­ate­ly won­der what does man­age entail? Is it to find spe­cif­ic func­tions you’re look­ing for? Is it to dis­cov­er what func­tions you have? Does this become a prob­lem when you have 10 func­tions or 100 func­tions? Or does it become a prob­lem only when you have more devel­op­ers work­ing on them than you’re able to keep track of?

Draw­ing from my own expe­ri­ences, the prob­lem we’re deal­ing with has less to do with what func­tions we have, but rather, what fea­tures and capa­bil­i­ties do we pos­sess through these func­tions.

After all, a Lamb­da func­tion, like a Dock­er con­tain­er, or an EC2 serv­er, is just a con­duit to deliv­er some busi­ness fea­ture or capa­bil­i­ty you require.

You wouldn’t ask “Do we have a get-user-by-facebook-id func­tion?” since you will need to know what the func­tion is called with­out even know­ing if the capa­bil­i­ty exists and if it’s cap­tured by a Lamb­da func­tion. Instead, you would prob­a­bly ask instead “Do we have a Lamb­da func­tion that can find a user based on his/her face­book ID?”.

So the real prob­lem is that, giv­en that we have a com­plex sys­tem that con­sists of many fea­tures and capa­bil­i­ties, that is main­tained by many teams of devel­op­ers, how do we orga­nize these fea­tures and capa­bil­i­ties into Lamb­da func­tions so that it’s opti­mised towards..

  • dis­cov­er­abil­i­ty: how do I find out what fea­tures and capa­bil­i­ties exist in our sys­tem already, and through which func­tions?
  • debug­ging: how do I quick­ly iden­ti­fy and locate the code I need to look at to debug a prob­lem? e.g. there are errors in sys­tem X’s logs, where do I find the rel­e­vant code to start debug­ging the sys­tem?
  • scal­ing the team: how do I min­imise fric­tion and allow me to grow the engi­neer­ing team?

These are the qual­i­ties that are most impor­tant to me. With this knowl­edge, I can com­pare the 2 approach­es and see which is best suit­ed for me.

You might care about dif­fer­ent qual­i­ties, for exam­ple, you might not care about scal­ing the team, but you real­ly wor­ry about the cost for run­ning your server­less archi­tec­ture. What­ev­er it might be, I think it’s always help­ful to make those design goals explic­it, and make sure they’re shared with and under­stood (maybe even agreed upon!) by your team.

Discoverability

Dis­cov­er­abil­i­ty is by no means a new prob­lem, accord­ing to Simon Ward­ley, it’s rather ram­pant in both gov­ern­ment as well as the pri­vate sec­tor, with most organ­i­sa­tions lack­ing a sys­tem­at­ic way for teams to share and dis­cov­er each other’s work.

As men­tioned ear­li­er, what’s impor­tant here is the abil­i­ty to find out what capa­bil­i­ties are avail­able through your func­tions, rather than what func­tions are there.

An argu­ment I often hear for mono­lith­ic func­tions, is that it reduces the no. of func­tions, which makes them eas­i­er to man­age.

On the sur­face, this seems to make sense. But the more I think about it the more it strikes me that the no. of func­tion would only be an imped­i­ment to our abil­i­ty to man­age our Lamb­da func­tions IF we try to man­age them by hand rather than using the tools avail­able to us already.

After all, if we are able to locate books by their con­tent (“find me books on the sub­ject of X”) in a huge phys­i­cal space with 10s of thou­sands of books, how can we strug­gle to find Lamb­da func­tions when there are so many tools avail­able to us?

With a sim­ple nam­ing con­ven­tion, like the one that the Serverless frame­work enforces, we can quick­ly find relat­ed func­tions by pre­fix.

For exam­ple, if I want to find all the func­tions that are part of our user API, I can do that by search­ing for user-api.

With tag­ging, we can also cat­a­logue func­tions across mul­ti­ple dimen­sions, such as envi­ron­ment, fea­ture name, what type of event source, the name of the author, and so on.

By default, the Server­less frame­work adds the STAGE tag to all of your func­tions. You can also add your own tags as well, see doc­u­men­ta­tion on how to add tags. The Lamb­da man­age­ment con­sole also gives you a handy drop­down list of the avail­able val­ues when you try to search by a tag.

If you have a rough idea of what you’re look­ing for, then the no. of func­tions is not going to be an imped­i­ment to your abil­i­ty to dis­cov­er what’s there.

On the oth­er hand, the capa­bil­i­ties of the user-api is imme­di­ate­ly obvi­ous with sin­gle-pur­posed func­tions, where I can see from the rel­e­vant func­tions that I have the basic CRUD capa­bil­i­ties because there are cor­re­spond­ing func­tions for each.

With a mono­lith­ic func­tion, how­ev­er, it’s not imme­di­ate­ly obvi­ous, and I’ll have to either look at the code myself, or have to con­sult with the author of the func­tion, which for me, makes for pret­ty poor dis­cov­er­abil­i­ty.

Because of this, I will mark the mono­lith­ic approach down on dis­cov­er­abil­i­ty.

Hav­ing more func­tions though, means there are more pages for you to scroll through if you just want to explore what func­tions are there rather than look­ing for any­thing spe­cif­ic.

Although, in my expe­ri­ence, with all the func­tions nice­ly clus­tered togeth­er by name pre­fix thanks to the nam­ing con­ven­tion the Server­less frame­work enforces, it’s actu­al­ly quite nice to see what each group of func­tions can do rather than hav­ing to guess what goes on inside a mono­lith­ic func­tion.

But, I guess it can be a pain to scroll through every­thing when you have thou­sands of func­tions. So, I’m going to mark sin­gle-pur­posed func­tions down only slight­ly for that. I think at that lev­el of com­plex­i­ty, even if you reduce the no. of func­tions by pack­ing more capa­bil­i­ties into each func­tion, you will still suf­fer more from not being able to know the true capa­bil­i­ties of those mono­lith­ic func­tions at a glance.

Debugging

In terms of debug­ging, the rel­e­vant ques­tion here is whether or not hav­ing few­er func­tions makes it eas­i­er to quick­ly iden­ti­fy and locate the code you need to look at to debug a prob­lem.

Based on my expe­ri­ence, the trail of bread­crumbs that leads you from, say, an HTTP error or an error stack trace in the logs, to the rel­e­vant func­tion and then the repo is the same regard­less whether the func­tion does one thing or many dif­fer­ent things.

What will be dif­fer­ent, is how you’d find the rel­e­vant code inside the repo for the prob­lems you’re inves­ti­gat­ing.

A mono­lith­ic func­tion that has more branch­ing and in gen­er­al does more things, would under­stand­ably take more cog­ni­tive effort to com­pre­hend and fol­low through to the code that is rel­e­vant to the prob­lem at hand.

For that, I’ll mark mono­lith­ic func­tions down slight­ly as well.

Scaling

One of ear­ly argu­ments that got thrown around for microser­vices is that it makes scal­ing eas­i­er, but that’s just not the case — if you know how to scale a sys­tem, then you can scale a mono­lith just as eas­i­ly as you can scale a microser­vice.

I say that as some­one who has built mono­lith­ic back­end sys­tems for games that had a mil­lion dai­ly active users. Super­cell, the par­ent com­pa­ny for my cur­rent employ­er, and cre­ator of top gross­ing games like Clash of Clans and Clash Royale, have well over 100 mil­lion dai­ly active users on their games and their back­end sys­tems for these games are all mono­liths as well.

Instead, what we have learnt from tech giants like the Ama­zon, and Net­flix, and Google of this world, is that a ser­vice ori­ent­ed style of archi­tec­ture makes it eas­i­er to scale in a dif­fer­ent dimen­sion — our engi­neer­ing team.

This style of archi­tec­ture allows us to cre­ate bound­aries with­in our sys­tem, around fea­tures and capa­bil­i­ties. In doing so it also allows our engi­neer­ing teams to scale the com­plex­i­ty of what they build as they can more eas­i­ly build on top of the work that oth­ers have cre­at­ed before them.

Take Google’s Cloud Data­s­tore for exam­ple, the engi­neers work­ing on that ser­vice were able to pro­duce a high­ly sophis­ti­cat­ed ser­vice by build­ing on top of many lay­ers of ser­vices, each pro­vide a pow­er lay­er of abstrac­tions.

These ser­vice bound­aries are what gives us that greater divi­sion of labour, which allows more engi­neers to work on the sys­tem by giv­ing them areas where they can work in rel­a­tive iso­la­tion. This way, they don’t con­stant­ly trip over each oth­er with merge con­flicts, and inte­gra­tion prob­lems, and so on.

Michael Nygard also wrote a nice arti­cle recent­ly that explains this ben­e­fit of bound­aries and iso­la­tion in terms of how it helps to reduce the over­head of shar­ing men­tal mod­els.

if you have a high coher­ence penal­ty and too many peo­ple, then the team as a whole moves slow­er… It’s about reduc­ing the over­head of shar­ing men­tal mod­els.”

- Michael Nygard

Hav­ing lots of sin­gle-pur­posed func­tions is per­haps the pin­na­cle of that divi­sion of task, and some­thing you lose a lit­tle when you move to mono­lith­ic func­tions. Although in prac­tice, you prob­a­bly won’t end up hav­ing so many devel­op­ers work­ing on the same project that you feel the pain, unless you real­ly pack them in with those mono­lith­ic func­tions!

Also, restrict­ing a func­tion to doing just one thing also helps lim­it how com­plex a func­tion can become. To make some­thing more com­plex you would instead com­pose these sim­ple func­tions togeth­er via oth­er means, such as with AWS Step Func­tions.

Once again, I’m going to mark mono­lith­ic func­tions down for los­ing some of that divi­sion of labour, and rais­ing the com­plex­i­ty ceil­ing of a func­tion.

Conclusion

As you can see, based on the cri­te­ria that are impor­tant to me, hav­ing many sin­gle-pur­posed func­tions is clear­ly the bet­ter way to go.

Like every­one else, I come pre­loaded with a set of pre­dis­po­si­tions and bias­es formed from my expe­ri­ences, which quite like­ly do not reflect yours. I’m not ask­ing you to agree with me, but to sim­ply appre­ci­ate the process of work­ing out the things that are impor­tant to you and your orga­ni­za­tion, and how to go about find­ing the right approach for you.

How­ev­er, if you dis­agree with my line of think­ing and the argu­ments I put for­ward for my selec­tion cri­te­ria — dis­cov­er­abil­i­ty, debug­ging, and scal­ing the team & com­plex­i­ty of the sys­tem — then please let me know via com­ments.

Like what you’re read­ing? Check out my video course Pro­duc­tion-Ready Server­less and learn the essen­tials of how to run a server­less appli­ca­tion in pro­duc­tion.

We will cov­er top­ics includ­ing:

  • authen­ti­ca­tion & autho­riza­tion with API Gate­way & Cog­ni­to
  • test­ing & run­ning func­tions local­ly
  • CI/CD
  • log aggre­ga­tion
  • mon­i­tor­ing best prac­tices
  • dis­trib­uted trac­ing with X-Ray
  • track­ing cor­re­la­tion IDs
  • per­for­mance & cost opti­miza­tion
  • error han­dling
  • con­fig man­age­ment
  • canary deploy­ment
  • VPC
  • secu­ri­ty
  • lead­ing prac­tices for Lamb­da, Kine­sis, and API Gate­way

You can also get 40% off the face price with the code ytcui. Hur­ry though, this dis­count is only avail­able while we’re in Manning’s Ear­ly Access Pro­gram (MEAP).