Introduction to AWS SimpleWorkflow Extensions Part 1 – Hello World example

Series so far:

2. Beyond Hello World

3. Parallelizing activities

 

In my previous post I mentioned some of the shortcomings with Amazon SimpleWorkflow (SWF) which drove me to create an extension library on top of the standard .Net SDK to make it easier to model workflows and business processes using SWF.

In this series of blog posts I’ll give you more examples of how to use the library to model workflows to be executed against the SWF service to take advantage of the reliable state management and task dispatch it offers, but none of the plumbing and boilerplate code you would have to deal with using the SDK.

Before we start looking at examples, let’s have a quick recap of the SWF terminologies:

  • A workflow is a sequence of steps that are loosely strung together by the decisions the decider makes each time the state of the workflow changes. E.g. step 1 complete then schedule step 2 to commence.
  • A workflow execution is an instance of a particular workflow currently being executed, many executions of the same workflow (identified by name and version) can be in flight at the same time. A workflow execution can be started with string as input and it can return string as output.
  • A decision task is a task that is scheduled each time a workflow’s state changes.
  • A decider is a component in your application which is responsible for polling SWF for decision tasks and respond with decisions. The sequence of steps that need to be performed by the workflow is ultimately determined by the decider.
  • An activity task is a task that is scheduled by a decider, it takes a string as input (along with several other pieces of data which it can be scheduled with) and returns a string as result.
  • An activity worker is a component in your application which is responsible for polling SWF for activity tasks and respond with completion or failure signals, as well as providing regular heartbeat signals. If the decider is responsible for scheduling work to be done, then the activity worker is responsible for doing the actual work.
  • A child workflow is a workflow that is scheduled by the decider as a step in a workflow, similar to an activity.
  • The decider is able to schedule both child workflows and activities for a single step in a workflow, whilst child workflows can be rerun as an independent unit of work, activities cannot be rerun independently outside of the context of a workflow.
  • Both workflows and activities need to be registered with the SWF service before they can be used.

These are the most common concepts/components you’ll see in SWF, but there are also less commonly used (in my opinion at least) features such as:

  • Starting a timer to cause a timer event to be fired after some time.
  • Signalling an external workflow execution to cause an event to be recorded in its execution history and a decision task to be scheduled. This is a useful way to allow inter-workflow communication, e.g. one workflow suspends itself, until another workflow sends it a signal and then it can resume with its execution.
  • Recording a marker as means to provide additional information in the execution history of a workflow.

 

Example : Hello World

Consider a workflow where there is only one activity, which simply prints the input to the screen and echoes it back out.

If we start a workflow execution with the input “Hello World!” then we expect to see the input being printed to the console and then the workflow execution completed with the result “Hello World!”.

image

In its essence, you can think of an activity as nothing more than a function which accepts a string as argument and return a string, i.e. a fun with signature string –> string in F#.

With the standard .Net SDK you will need to write a decider for each workflow in order to provide the orchestration you need for that workflow. The decider logic tends to quickly become difficult to understand and maintain when the decision logic becomes more complicated, e.g. when multiple activities and child workflows are scheduled in parallel ,and you need to retry/fail activities/workflows, etc.

In my view, the decider is largely plumbing that developers should do without, so with the extensions library you should not need to write any custom decider code but instead, simply declare what activities and/or child workflows should be scheduled at each stage of a workflow and let the library do all the heavy lifting for you!

As far as workflow modelling is concerned, the only thing you need to do is use the custom ++> operator (inspired by Dave Thomas’s pipelets project) to attach additional steps to your workflow. So the above workflow can be modelled as:

and that’s it! No need to register the workflow and activity and write bespoke decider & activity worker yourself, the library does all of that for you, all you needed to do was to model the workflow you want.

Notice you haven’t had to provide any reference to SWF at all thus far, in fact, you only need to provide an instance of AmazonSimpleWorkflowClient (from the AWS SDK) when you start the workflow:

image

This way, it’s possible to run the workflow across multiple accounts simultaneously (dev, staging, prod, etc.) by calling the Start method with each of the client instances (one for each account), which fits well with the mobile worker model SWF is designed with – SWF holds the state but you can run your workers from anywhere in and out of the AWS ecosystem.

Once you’ve started the workflow, the library will automatically register the domain, workflow and activity for you if they are not present already. You can verify this by looking in the SWF Management Console:

image

Notice that whilst we didn’t specify the “echo” activity with a version number, it’s registered with “echo.0”? I’ll go into more details on the versioning scheme in a later post, but for now let’s just be glad that we didn’t have to register these by hand!

Next, you can start a workflow directly from the management console, but ticking against the workflow you want to start and clicking the “Start New Execution” button:

image

Let’s follow through with the dialogue box and set the input as Hello World! as below:

image image

Once you start the workflow execution you will see Hello World! being printed in the console:

image

This is a sign that our echo function (which is invoked by the generated activity worker) had been called.

Back in the SWF Management Console, if you look under Workflow Executions, you should see the execution is closed after having completed successfully:

image

Clicking on the workflow execution ID allows you to see the sequence of events which had been recorded for this execution:

image

This is a very granular view of what happened during the workflow execution, giving you plenty of useful information if you ever need to investigate why a workflow execution failed, for instance.

If you switch to the Activities tab, you’ll get a more condensed view with just the activities that were scheduled, along with their inputs, results, etc.

image

For now, ignore the JSON string in the Control field and the format of the Activity ID, these are both automatically generated by the library based on a set of conventions and will be covered by a later post.

So that’s it! I hope you can see that this extension library gives you a powerful way to express and model a workflow and focus your development efforts on the things that count (designing the process and writing the code that does the actual work) rather than wasting precious developer time on getting your code to work with SWF!

 

Parting Thoughts

For Java developers, there is an existing high-level framework (provided by Amazon itself) for working with SWF called the Flow Framework, which adapts a more object-oriented approach and in my opinion requires far more plumbing and most importantly does not

In case you’re wondering, this is how a solution to a similar Hello World example looks using the flow framework (taken straight from the flow framework developer guide) for your comparison:

Making Amazon SimpleWorkflow simpler to work with

Amazon SimpleWorkflow (abbreviated to SWF from here on) is a workflow service provided by Amazon which allows you to model business processes as workflows using a task based programming model. The service provides reliable task dispatch and state management so that you can focus on developing ‘workers’ to perform the tasks that are required to move the execution of a workflow along.

Introduction to SWF

For more information about SWF, have a look at the following introductory webinar.

There are two types of tasks:

  • Activity task – tells an ‘activity worker’ to perform a specific function, e.g. check inventory or charge a credit card.
  • Decision task – tells a ‘decider’ that the state of a workflow execution has changed so that it can determine what the next course of action should be, e.g. continue to the next activity in the workflow, or complete the workflow if all activities have been successfully completed

Both the activity worker and decider needs to poll SWF service for tasks and respond with some result or decisions respectively after receiving a task. Each task is associated with one or more timeout values and if no response is received before the timeout expires then the task will timeout and can be rescheduled (in the case of a decision task, it is rescheduled automatically by the system).

Since tasks can be polled from just about anywhere (from an EC2 instance, or your home computer/laptop) and the tasks received can be part of any number of currently executing workflows, both the activity worker and decider should be completely stateless and can be distributed across any number of locations both inside and outside of the AWS ecosystem.

The history (as a sequence of events each keyed to a unique ID) of each workflow execution is available to view in the AWS Management Console so that you have plenty of information to aid you when investigating why workflows failed, for instance.

imageimage

 

Consider the following example given by the SWF developer guide:

Sample Workflow Overview

Each of the steps can be represented as an activity task and along the way the decider will receive decision tasks and by inspecting the history of events thus far the decider can schedule the next activity in the workflow, e.g.

image

The actual SWF API is rather different so the pseudo code above tends to translate to something slightly more involved, which brings us to the topic of..

Short Comings of SWF

Workflows are modelled implicitly

In my opinion the biggest shortcoming with SWF is that the workflow itself (an order sequence of activities) is implied by the decider logic and at no point as you work with the service does it feel like you’re actually modelling a workflow. This might not be an issue in simple cases, but as you string together more and more activities (and potentially child workflows) and having to pass data along from one activity to the next and deal with failure cases the decider logic is going to become much more complex and difficult to maintain.

Need for boilerplate

The .Net AWSSDK provides a straight mapping to the set of actions available on the SWF service and provides very little added value to developers because as it stands every workflow requires boilerplate code to:

  • poll for decision task (multiple times if you need to go back further than the max 100 events per request)
  • inspect history of events after receiving a decision task
  • schedule next activity or complete workflow based on last events
  • poll for activity task
  • record heartbeats periodically when processing an activity task
  • respond completed message on successful completion of the activity task
  • capture exceptions during the processing of a task and respond failed message

Many of these steps are common across all deciders and activity workers and it’s left to you to implement this missing layer of abstraction. Java developers have access of a heavy-weight Flow Framework which allows you to declaratively (using decorators) specify activities and workflows, giving you more of a sense of modelling the workflow and its constituent activities. However, as you can see from the canonical ‘hello world’ example, a lot of code is required to carry out even a simple workflow, not to mention the various framework concepts one would have to learn..

A light-weight, intuitive abstraction layer is badly needed.

All activity and workflow types must be registered

Every workflow and every activity needs to be explicitly registered with SWF before they can be executed, and like workflow executions, registered workflow and activity types can be viewed directly in the AWS Management Console:

image

This registration can be done programmatically (as is the case with the Flow Framework) or via the AWS Management Console. The programmatic approach is clearly preferred but again, as far as .Net developers are concerned, it’s an automation step which you’d have to implement yourself and derive a versioning scheme for both workflows and activities. As a developer who just wants to model and implement a workflow with SWF, the registration represents another step in the development process which you would rather do without.

Another thing to keep in mind is that, in the case where you have more than one activity with the same name but part of different workflows and require different task to be performed, you need a way to distinguish between the different activities so that the corresponding activity workers do not pick up the incorrect task.

SimpleWorkflow.Extensions

Driven by the pain of developing against SWF because of its numerous shortcomings (pain-driven development…) I started working on an extension library to the .Net AWSSDK to give .Net developers an intuitive API to model workflows and handle all the necessary boilerplate tasks (such as exception handling, etc.) so that you can truly focus on modelling workflows and not worry about all the other plumbing required for working with SWF.

Intuitive modelling API

The simple ‘hello world’ example given by the Flow Framework can be modelled with less than 10 lines of code that are far easier to understand:

Here the ++> operator attaches an activity or child workflow to an existing empty workflow and returns a new instance of Workflow rather than modifying the existing workflow (in the spirit of functional programming and immutability).

An activity in SWF terms, in essence can be thought of a function which takes an input (string), performs some task and returns a result (string). Hence the Activity class you see above accepts a function of the signature string –> string though there is a generic variant Activity<TInput, TOutput> which takes a function of signature TInput –> TOutput and uses ServiceStack.Text JSON serializer (the fastest JSON serializer for .Net) to marshal data to and from string.

Exchanging data between activities

The input to the workflow execution is passed to the first activity as input, and the result provided by the first activity is then passed to the second activity as input and so on. This exchange of data also extends to child workflows, for example:

Starting a workflow execution with the input ‘theburningmonk’ prints the following outputs to the console:

MacDonald: hello theburningmonk!

MacDonald: good bye, theburningmonk!

Old MacDonald had a farm

EE-I-EE-I-O

To visualize the sequence of event and how data is exchanged from one activity to the next:

starts main workflow “with_child_workflow” with input “theburningmonk”

-> “theburningmonk” is passed as input to the activity “greet”

-> calls curried function greet “MacDonald” with “theburningmonk”

-> greet function prints “MacDonald: hello theburningmonk!” to console

-> greet function returns “theburningmonk”

-> “theburningmonk” is passed as input to activity “bye”

-> calls curried function bye “MacDonald” with “theburningmonk”

-> bye function prints “MacDonald: good bye, theburningmonk!” to console

-> bye function returns “MacDonald”

-> “MacDonald” is used as input to start the child workflow “sing_along”

-> “MacDonald” is passed as input to the activity “sing”

-> calls function sing with “MacDonald”

-> sing function prints “Old MacDonald had a farm” to console

-> sing function returns “EE-I-EE-I-O”

-> the child workflow “sing_along” completes with result “EE-I-EE-I-O”

-> “EE-I-EE-I-O” is passed as input to the activity “echo”

-> calls function echo with “EE-I-EE-I-O”

-> echo function prints “EE-I-EE-I-O” to console

-> echo function returns “EE-I-EE-I-O”

-> main workflow “with_child_workflow” completes with result “EE-I-EE-I-O”

Error and Retry mechanism

You can optionally specify the max number of attempts (e.g. max 3 attempts = original attempt + 2 retries) that should be made for each activity or child workflow before letting it fail/timeout and fail the workflow.

Automatic workflow and activity registrations

The domain, workflow and activity types are all registered automatically (if they haven’t been registered already) when you start a workflow. You might notice that you don’t need to specify a version for each of the activities, this is because there is an convention-based versioning scheme in place (see below).

Versioning scheme

Deriving a versioning scheme for your activities is at best an arbitrary decision and one that is required by SWF which adds friction to the development process without adding much value to the developers.

The versioning scheme I’m using is such that if an activity ‘echo’ is part of a workflow ‘with_child_workflow’ and is the 4th activity in the workflow, then the version for this particular instance of ‘echo’ activity is with_child_workflow.3.

This scheme allows you to:

  • decouple the name of an activity to the delegate function
  • reuse the same activity name in different workflows, and allow them to perform different tasks if need be
  • reuse the same activity name for different activities in the same workflow, and allow them to perform different tasks if need be

Asynchronous execution

Nearly all of the communication with SWF (polling, responding with result, etc.) are all done asynchronously using non-blocking IO (using F# async workflows).

 

Currently, the extension library can also be used from F#, I’m still undecided on the API for C# (because you won’t be able to use the ++> custom operator) and would welcome any suggestions you might have!

As you can see from the Issues list, there is still a couple of things I want to add support for, but you should be seeing a Nuget package being made available in the near future. But if you want to try it out in the meantime, feel free to grab the source and run the various examples I had added in the ExampleFs project.

Enjoy!