Making Amazon SimpleWorkflow simpler to work with

Ama­zon Sim­ple­Work­flow (abbre­vi­at­ed to SWF from here on) is a work­flow ser­vice pro­vid­ed by Ama­zon which allows you to mod­el busi­ness process­es as work­flows using a task based pro­gram­ming mod­el. The ser­vice pro­vides reli­able task dis­patch and state man­age­ment so that you can focus on devel­op­ing ‘work­ers’ to per­form the tasks that are required to move the exe­cu­tion of a work­flow along.

Introduction to SWF

For more infor­ma­tion about SWF, have a look at the fol­low­ing intro­duc­to­ry webi­nar.

There are two types of tasks:

  • Activ­i­ty task – tells an ‘activ­i­ty work­er’ to per­form a spe­cif­ic func­tion, e.g. check inven­to­ry or charge a cred­it card.
  • Deci­sion task – tells a ‘decider’ that the state of a work­flow exe­cu­tion has changed so that it can deter­mine what the next course of action should be, e.g. con­tin­ue to the next activ­i­ty in the work­flow, or com­plete the work­flow if all activ­i­ties have been suc­cess­ful­ly com­plet­ed

Both the activ­i­ty work­er and decider needs to poll SWF ser­vice for tasks and respond with some result or deci­sions respec­tive­ly after receiv­ing a task. Each task is asso­ci­at­ed with one or more time­out val­ues and if no response is received before the time­out expires then the task will time­out and can be resched­uled (in the case of a deci­sion task, it is resched­uled auto­mat­i­cal­ly by the sys­tem).

Since tasks can be polled from just about any­where (from an EC2 instance, or your home computer/laptop) and the tasks received can be part of any num­ber of cur­rent­ly exe­cut­ing work­flows, both the activ­i­ty work­er and decider should be com­plete­ly state­less and can be dis­trib­uted across any num­ber of loca­tions both inside and out­side of the AWS ecosys­tem.

The his­to­ry (as a sequence of events each keyed to a unique ID) of each work­flow exe­cu­tion is avail­able to view in the AWS Man­age­ment Con­sole so that you have plen­ty of infor­ma­tion to aid you when inves­ti­gat­ing why work­flows failed, for instance.

imageimage

 

Con­sid­er the fol­low­ing exam­ple giv­en by the SWF devel­op­er guide:

Sample Workflow Overview

Each of the steps can be rep­re­sent­ed as an activ­i­ty task and along the way the decider will receive deci­sion tasks and by inspect­ing the his­to­ry of events thus far the decider can sched­ule the next activ­i­ty in the work­flow, e.g.

image

The actu­al SWF API is rather dif­fer­ent so the pseu­do code above tends to trans­late to some­thing slight­ly more involved, which brings us to the top­ic of..

Short Comings of SWF

Work­flows are mod­elled implic­it­ly

In my opin­ion the biggest short­com­ing with SWF is that the work­flow itself (an order sequence of activ­i­ties) is implied by the decider log­ic and at no point as you work with the ser­vice does it feel like you’re actu­al­ly mod­el­ling a work­flow. This might not be an issue in sim­ple cas­es, but as you string togeth­er more and more activ­i­ties (and poten­tial­ly child work­flows) and hav­ing to pass data along from one activ­i­ty to the next and deal with fail­ure cas­es the decider log­ic is going to become much more com­plex and dif­fi­cult to main­tain.

Need for boil­er­plate

The .Net AWSSDK pro­vides a straight map­ping to the set of actions avail­able on the SWF ser­vice and pro­vides very lit­tle added val­ue to devel­op­ers because as it stands every work­flow requires boil­er­plate code to:

  • poll for deci­sion task (mul­ti­ple times if you need to go back fur­ther than the max 100 events per request)
  • inspect his­to­ry of events after receiv­ing a deci­sion task
  • sched­ule next activ­i­ty or com­plete work­flow based on last events
  • poll for activ­i­ty task
  • record heart­beats peri­od­i­cal­ly when pro­cess­ing an activ­i­ty task
  • respond com­plet­ed mes­sage on suc­cess­ful com­ple­tion of the activ­i­ty task
  • cap­ture excep­tions dur­ing the pro­cess­ing of a task and respond failed mes­sage

Many of these steps are com­mon across all deciders and activ­i­ty work­ers and it’s left to you to imple­ment this miss­ing lay­er of abstrac­tion. Java devel­op­ers have access of a heavy-weight Flow Frame­work which allows you to declar­a­tive­ly (using dec­o­ra­tors) spec­i­fy activ­i­ties and work­flows, giv­ing you more of a sense of mod­el­ling the work­flow and its con­stituent activ­i­ties. How­ev­er, as you can see from the canon­i­cal ‘hel­lo world’ exam­ple, a lot of code is required to car­ry out even a sim­ple work­flow, not to men­tion the var­i­ous frame­work con­cepts one would have to learn..

A light-weight, intu­itive abstrac­tion lay­er is bad­ly need­ed.

All activ­i­ty and work­flow types must be reg­is­tered

Every work­flow and every activ­i­ty needs to be explic­it­ly reg­is­tered with SWF before they can be exe­cut­ed, and like work­flow exe­cu­tions, reg­is­tered work­flow and activ­i­ty types can be viewed direct­ly in the AWS Man­age­ment Con­sole:

image

This reg­is­tra­tion can be done pro­gram­mat­i­cal­ly (as is the case with the Flow Frame­work) or via the AWS Man­age­ment Con­sole. The pro­gram­mat­ic approach is clear­ly pre­ferred but again, as far as .Net devel­op­ers are con­cerned, it’s an automa­tion step which you’d have to imple­ment your­self and derive a ver­sion­ing scheme for both work­flows and activ­i­ties. As a devel­op­er who just wants to mod­el and imple­ment a work­flow with SWF, the reg­is­tra­tion rep­re­sents anoth­er step in the devel­op­ment process which you would rather do with­out.

Anoth­er thing to keep in mind is that, in the case where you have more than one activ­i­ty with the same name but part of dif­fer­ent work­flows and require dif­fer­ent task to be per­formed, you need a way to dis­tin­guish between the dif­fer­ent activ­i­ties so that the cor­re­spond­ing activ­i­ty work­ers do not pick up the incor­rect task.

SimpleWorkflow.Extensions

Dri­ven by the pain of devel­op­ing against SWF because of its numer­ous short­com­ings (pain-dri­ven devel­op­ment…) I start­ed work­ing on an exten­sion library to the .Net AWSSDK to give .Net devel­op­ers an intu­itive API to mod­el work­flows and han­dle all the nec­es­sary boil­er­plate tasks (such as excep­tion han­dling, etc.) so that you can tru­ly focus on mod­el­ling work­flows and not wor­ry about all the oth­er plumb­ing required for work­ing with SWF.

Intu­itive mod­el­ling API

The sim­ple ‘hel­lo world’ exam­ple giv­en by the Flow Frame­work can be mod­elled with less than 10 lines of code that are far eas­i­er to under­stand:

Here the ++> oper­a­tor attach­es an activ­i­ty or child work­flow to an exist­ing emp­ty work­flow and returns a new instance of Work­flow rather than mod­i­fy­ing the exist­ing work­flow (in the spir­it of func­tion­al pro­gram­ming and immutabil­i­ty).

An activ­i­ty in SWF terms, in essence can be thought of a func­tion which takes an input (string), per­forms some task and returns a result (string). Hence the Activ­i­ty class you see above accepts a func­tion of the sig­na­ture string –> string though there is a gener­ic vari­ant Activity<TInput, TOut­put> which takes a func­tion of sig­na­ture TIn­put –> TOut­put and uses ServiceStack.Text JSON seri­al­iz­er (the fastest JSON seri­al­iz­er for .Net) to mar­shal data to and from string.

Exchang­ing data between activ­i­ties

The input to the work­flow exe­cu­tion is passed to the first activ­i­ty as input, and the result pro­vid­ed by the first activ­i­ty is then passed to the sec­ond activ­i­ty as input and so on. This exchange of data also extends to child work­flows, for exam­ple:

Start­ing a work­flow exe­cu­tion with the input ‘the­burn­ing­monk’ prints the fol­low­ing out­puts to the con­sole:

Mac­Don­ald: hel­lo the­burn­ing­monk!

Mac­Don­ald: good bye, the­burn­ing­monk!

Old Mac­Don­ald had a farm

EE-I-EE-I-O

To visu­al­ize the sequence of event and how data is exchanged from one activ­i­ty to the next:

starts main work­flow “with_child_workflow” with input “the­burn­ing­monk”

-> “the­burn­ing­monk” is passed as input to the activ­i­ty “greet”

-> calls cur­ried func­tion greet “Mac­Don­ald” with “the­burn­ing­monk”

-> greet func­tion prints “Mac­Don­ald: hel­lo the­burn­ing­monk!” to con­sole

-> greet func­tion returns “the­burn­ing­monk”

-> “the­burn­ing­monk” is passed as input to activ­i­ty “bye”

-> calls cur­ried func­tion bye “Mac­Don­ald” with “the­burn­ing­monk”

-> bye func­tion prints “Mac­Don­ald: good bye, the­burn­ing­monk!” to con­sole

-> bye func­tion returns “Mac­Don­ald”

-> “Mac­Don­ald” is used as input to start the child work­flow “sing_along”

-> “Mac­Don­ald” is passed as input to the activ­i­ty “sing”

-> calls func­tion sing with “Mac­Don­ald”

-> sing func­tion prints “Old Mac­Don­ald had a farm” to con­sole

-> sing func­tion returns “EE-I-EE-I-O”

-> the child work­flow “sing_along” com­pletes with result “EE-I-EE-I-O”

-> “EE-I-EE-I-O” is passed as input to the activ­i­ty “echo”

-> calls func­tion echo with “EE-I-EE-I-O”

-> echo func­tion prints “EE-I-EE-I-O” to con­sole

-> echo func­tion returns “EE-I-EE-I-O”

-> main work­flow “with_child_workflow” com­pletes with result “EE-I-EE-I-O”

Error and Retry mech­a­nism

You can option­al­ly spec­i­fy the max num­ber of attempts (e.g. max 3 attempts = orig­i­nal attempt + 2 retries) that should be made for each activ­i­ty or child work­flow before let­ting it fail/timeout and fail the work­flow.

Auto­mat­ic work­flow and activ­i­ty reg­is­tra­tions

The domain, work­flow and activ­i­ty types are all reg­is­tered auto­mat­i­cal­ly (if they haven’t been reg­is­tered already) when you start a work­flow. You might notice that you don’t need to spec­i­fy a ver­sion for each of the activ­i­ties, this is because there is an con­ven­tion-based ver­sion­ing scheme in place (see below).

Ver­sion­ing scheme

Deriv­ing a ver­sion­ing scheme for your activ­i­ties is at best an arbi­trary deci­sion and one that is required by SWF which adds fric­tion to the devel­op­ment process with­out adding much val­ue to the devel­op­ers.

The ver­sion­ing scheme I’m using is such that if an activ­i­ty ‘echo’ is part of a work­flow ‘with_child_workflow’ and is the 4th activ­i­ty in the work­flow, then the ver­sion for this par­tic­u­lar instance of ‘echo’ activ­i­ty is with_child_workflow.3.

This scheme allows you to:

  • decou­ple the name of an activ­i­ty to the del­e­gate func­tion
  • reuse the same activ­i­ty name in dif­fer­ent work­flows, and allow them to per­form dif­fer­ent tasks if need be
  • reuse the same activ­i­ty name for dif­fer­ent activ­i­ties in the same work­flow, and allow them to per­form dif­fer­ent tasks if need be

Asyn­chro­nous exe­cu­tion

Near­ly all of the com­mu­ni­ca­tion with SWF (polling, respond­ing with result, etc.) are all done asyn­chro­nous­ly using non-block­ing IO (using F# async work­flows).

 

Cur­rent­ly, the exten­sion library can also be used from F#, I’m still unde­cid­ed on the API for C# (because you won’t be able to use the ++> cus­tom oper­a­tor) and would wel­come any sug­ges­tions you might have!

As you can see from the Issues list, there is still a cou­ple of things I want to add sup­port for, but you should be see­ing a Nuget pack­age being made avail­able in the near future. But if you want to try it out in the mean­time, feel free to grab the source and run the var­i­ous exam­ples I had added in the Exam­pleFs project.

Enjoy!