Whilst search­ing for an ele­gant solu­tion to apply string intern­ing across a large num­ber of classes (we’re talk­ing about hun­dreds of classes here..) it dawned on me that I can achieve this with ease using PostSharp’s Loca­tion­In­ter­cep­tionAspect. All I needed was some­thing along the lines of:

You can apply this attribute to a class or even a whole assem­bly and it’ll ensure every piece of string con­structed is interned, includ­ing string prop­er­ties and fields defined by its sub­class, which is exactly what I was after.

For exam­ple, take this triv­ial piece of code:

image

If you inspect the com­piled code for the Base class in ILSpy you will see some­thing along the lines of:

image

notice how the set­ter for BaseS­tring­Prop­erty has been mod­i­fied to invoke the OnSet­Value method defined in our aspect above as opposed to the set­ter method. In this case, it’ll call the String.Intern method to retrieve a ref­er­ence to an interned instance of the string and set the prop­erty to that reference.

For more details on PostSharp’s inter­cep­tion aspects, I rec­om­mend read­ing Dustin Davis’s excel­lent posts on the topic:

Post­Sharp Prin­ci­ples: Day 7 Inter­cep­tion Aspects – Part 1

Post­Sharp Prin­ci­ples: Day 8 Inter­cep­tion Aspects – Part 2

 

As we’ve spec­i­fied the mul­ti­cast inher­i­tance behav­iour to mul­ti­cast the attribute to mem­bers of the chil­dren of the orig­i­nal ele­ment, the string prop­er­ties defined in both A and B classes are also sub­ject to the same string intern­ing treat­ment with­out us hav­ing to explic­itly apply the Inter­nAt­tribute on them:

image

 

F# Com­pat­i­ble

What’s more, this attribute also works with F# types too, includ­ing record and dis­crim­i­nated unions types. Take for instance:

image

If you look at the gen­er­ated C# code for the dis­crim­i­nated union type, the inter­nal MyDuType.CaseB type would look some­thing like the following:

image

notice how the two inter­nal item1 and item2 properties’s set­ter meth­ods have been mod­i­fied in much the same way as the C# exam­ples above? The pub­lic Item1 and Item2 prop­er­ties are read-only and get their val­ues from the inter­nal prop­er­ties instead.

Indeed, when a new instance of the CaseB type is con­structed, it is the inter­nal prop­er­ties whose val­ues are initialized:

image

 

Finally, let’s look at the record type, which inter­est­ingly also defines a non-string field:

image

because we have spec­i­fied that the Inter­nAt­tribute should only be applied to prop­er­ties or fields of type string (via the Com­pile­TimeVal­i­date method which is exe­cuted as part of the post-compilation weav­ing process as opposed to run­time), so the inter­nal rep­re­sen­ta­tion of the Age field is left unaltered.

The Name field, how­ever, being of string type, was sub­ject to the same trans­for­ma­tion as all our other examples.

 

I hope this lit­tle attribute can prove to be use­ful to you too, it has cer­tainly saved me from an unbear­able amount of grunt work!

Share

Hi, just a quick to say that my talk with .Net Rocks is now avail­able on their web site. In this talk I shared some insights into how F# is used in our stack to help us build the back­end for our social games, specif­i­cally in the areas of:

 

Enjoy!

Share

The series so far:

  1. Hello World exam­ple
  2. Beyond Hello World

 

Within a work­flow, not all activ­i­ties have to be per­formed sequen­tially. In fact, to increase through­put and/or reduce the over­all time required to fin­ish a work­flow, you might want to per­form sev­eral activ­i­ties in par­al­lel pro­vided that they don’t have any inter-dependencies and can be per­formed independently.

In this post we’re going to see how we can use the SWF exten­sions library to par­al­lelize activ­i­ties by sched­ul­ing sev­eral activ­ity tasks at a sin­gle step and then aggre­gate their results into a sin­gu­lar input to the next activ­ity in the workflow.

image

The par­al­lelized activ­i­ties receive their input from either:

  1. the work­flow execution’s input if this is the first step of the work­flow, or
  2. the result of the pre­ced­ing activity/child work­flow in the workflow

As of now, the library requires you to spec­ify a ‘reducer’ which is respon­si­ble for aggre­gat­ing the results of the par­al­lel activ­i­ties into a sin­gle string which is returned as the result of the step in the work­flow. There are some caveats to this reducer func­tion (of sig­na­ture Dictionary<int, string> –> string right now) as you will see in the exam­ple, I’ll look to address these odd­i­ties and clean up the API in future ver­sions of the library, so please bear with me for now.

The aggre­gate result of these par­al­lel activ­i­ties can then be passed along as the input to the sub­se­quent activ­ity as per the exam­ple in my pre­vi­ous post.

Exam­ple : Count HTML ele­ment types

Sup­pose that, given a URL, you want to count the num­ber of dif­fer­ent HTML ele­ments (e.g. <div>, <span>, …) the returned HTML con­tains, the count­ing of each ele­ment type is inde­pen­dent and can be car­ried out in par­al­lel. For nicety, we can add an echo activ­ity before and after the count activ­i­ties so that we can print the input URL and the results to the screen. So you will end up with a work­flow that per­haps looks like this:

image

The imple­men­ta­tion of this work­flow is as follows:

Untitled

 1: #r "bin/Release/AWSSDK.dll"
 2: #r "bin/Release/SWF.Extensions.Core.dll"
 3: 
 4: open Amazon.SimpleWorkflow
 5: open Amazon.SimpleWorkflow.Extensions
 6: 
 7: open System.Collections.Generic
 8: open System.Net
 9: 
10: let echo str = printfn "%s" str; str
11: 
12: // a function to count the number of occurances of a pattern inside the HTML returned
13: // by the specified URL address
14: let countMatches (pattern : string) (address : string) =
15:     let webClient = new WebClient()
16:     let html = webClient.DownloadString address
17: 
18:     seq { 0..html.Length - pattern.Length }
19:     |> Seq.map (fun i -> html.Substring(i, pattern.Length))
20:     |> Seq.filter ((=) pattern)
21:     |> Seq.length
22: 
23: let echoActivity = Activity(
24:                         "echo", "echo input", echo,
25:                         taskHeartbeatTimeout       = 60, 
26:                         taskScheduleToStartTimeout = 10,
27:                         taskStartToCloseTimeout    = 10, 
28:                         taskScheduleToCloseTimeout = 20)
29: 
30: let countDivs = Activity<string, int>(
31:                         "count_divs", "count the number of <div> elements", 
32:                         countMatches "<div",
33:                         taskHeartbeatTimeout       = 60, 
34:                         taskScheduleToStartTimeout = 10,
35:                         taskStartToCloseTimeout    = 10, 
36:                         taskScheduleToCloseTimeout = 20)
37: 
38: let countScripts = Activity<string, int>(
39:                         "count_scripts", "count the number of <script> elements", 
40:                         countMatches "<script",
41:                         taskHeartbeatTimeout       = 60, 
42:                         taskScheduleToStartTimeout = 10,
43:                         taskStartToCloseTimeout    = 10, 
44:                         taskScheduleToCloseTimeout = 20)
45: 
46: let countSpans = Activity<string, int>(
47:                         "count_spans", "count the number of <span> elements", 
48:                         countMatches "<span",
49:                         taskHeartbeatTimeout       = 60, 
50:                         taskScheduleToStartTimeout = 10,
51:                         taskStartToCloseTimeout    = 10, 
52:                         taskScheduleToCloseTimeout = 20)
53: 
54: let countActivities = [| countDivs      :> ISchedulable
55:                          countScripts   :> ISchedulable
56:                          countSpans     :> ISchedulable |]
57: 
58: let countReducer (results : Dictionary<int, string>) =
59:     sprintf "Divs : %d\nScripts : %d\nSpans : %d\n" (int results.[0]) (int results.[1]) (int results.[2])
60: 
61: let countElementsWorkflow = 
62:     Workflow(domain = "theburningmonk.com", name = "count_html_elements", 
63:              description = "this workflow counts", 
64:              version = "1")
65:     ++> echoActivity
66:     ++> (countActivities, countReducer)
67:     ++> echoActivity
68: 
69: let awsKey      = "PUT-YOUR-AWS-KEY-HERE"
70: let awsSecret   = "PUT-YOUR-AWS-SECRET-HERE"
71: let client = new AmazonSimpleWorkflowClient(awsKey, awsSecret)
72: 
73: countElementsWorkflow.Start(client)

name­space Amazon
name­space Amazon.SimpleWorkflow
name­space Amazon.SimpleWorkflow.Extensions
name­space System
name­space System.Collections
name­space System.Collections.Generic
name­space System.Net
val echo : str:string -> string

Full name: ParallelizeActivities.echo

val str : string
val printfn : format:Printf.TextWriterFormat<‘T> -> ‘T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn

val count­Matches : pattern:string -> address:string -> int

Full name: ParallelizeActivities.countMatches

val pat­tern : string
Mul­ti­ple items

val string : value:‘T -> string

Full name: Microsoft.FSharp.Core.Operators.string

——————–

type string = System.String

Full name: Microsoft.FSharp.Core.string

val address : string
val web­Client : WebClient
Mul­ti­ple items

type Web­Client =

  inherit Component

  new : unit -> WebClient

  mem­ber AllowRead­Stream­Buffer­ing : bool with get, set

  mem­ber AllowWriteStream­Buffer­ing : bool with get, set

  mem­ber BaseAd­dress : string with get, set

  mem­ber CacheP­ol­icy : Request­CacheP­ol­icy with get, set

  mem­ber Can­ce­lA­sync : unit -> unit

  mem­ber Cre­den­tials : ICre­den­tials with get, set

  mem­ber Down­load­Data : address:string -> byte[] + 1 overload

  mem­ber Down­load­DataA­sync : address:Uri -> unit + 1 overload

  mem­ber Down­load­DataTaskA­sync : address:string -> Task<byte[]> + 1 overload

  …

Full name: System.Net.WebClient

——————–

Web­Client() : unit

val html : string
WebClient.DownloadString(address: System.Uri) : string

WebClient.DownloadString(address: string) : string
Mul­ti­ple items

val seq : sequence:seq<‘T> -> seq<‘T>

Full name: Microsoft.FSharp.Core.Operators.seq

——————–

type seq<‘T> = IEnumerable<‘T>

Full name: Microsoft.FSharp.Collections.seq<_>

prop­erty System.String.Length: int
mod­ule Seq

from Microsoft.FSharp.Collections

val map : mapping:(‘T -> ‘U) -> source:seq<‘T> -> seq<‘U>

Full name: Microsoft.FSharp.Collections.Seq.map

val i : int
System.String.Substring(startIndex: int) : string

System.String.Substring(startIndex: int, length: int) : string
val fil­ter : predicate:(‘T -> bool) -> source:seq<‘T> -> seq<‘T>

Full name: Microsoft.FSharp.Collections.Seq.filter

val length : source:seq<‘T> -> int

Full name: Microsoft.FSharp.Collections.Seq.length

val echoAc­tiv­ity : Activity<string,string>

Full name: ParallelizeActivities.echoActivity

Mul­ti­ple items

type Activ­ity = Activity<string,string>

Full name: Amazon.SimpleWorkflow.Extensions.Activity

——————–

new : name:obj * description:obj * processor:System.Func<‘TInput,‘TOutput> * taskHeartbeatTimeout:obj * taskScheduleToStartTimeout:obj * taskStartToCloseTimeout:obj * taskScheduleToCloseTimeout:obj * ?taskList:obj -> Activity<‘TInput,‘TOutput>

new : name:string * description:string * processor:(‘TInput -> ‘TOut­put) * taskHeartbeatTimeout:Model.Seconds * taskScheduleToStartTimeout:Model.Seconds * taskStartToCloseTimeout:Model.Seconds * taskScheduleToCloseTimeout:Model.Seconds * ?taskList:string * ?maxAttempts:int -> Activity<‘TInput,‘TOutput>

val count­Divs : Activity<string,int>

Full name: ParallelizeActivities.countDivs

Mul­ti­ple items

val int : value:‘T -> int (requires mem­ber op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

——————–

type int = int32

Full name: Microsoft.FSharp.Core.int

——————–

type int<‘Measure> = int

Full name: Microsoft.FSharp.Core.int<_>

val countScripts : Activity<string,int>

Full name: ParallelizeActivities.countScripts

val countSpans : Activity<string,int>

Full name: ParallelizeActivities.countSpans

val coun­tAc­tiv­i­ties : ISchedulable []

Full name: ParallelizeActivities.countActivities

type ISchedu­la­ble =

  interface

    abstract mem­ber Descrip­tion : string

    abstract mem­ber Max­At­tempts : int

    abstract mem­ber Name : string

  end

Full name: Amazon.SimpleWorkflow.Extensions.ISchedulable

val coun­tRe­ducer : results:Dictionary<int,string> -> string

Full name: ParallelizeActivities.countReducer

val results : Dictionary<int,string>
Mul­ti­ple items

type Dictionary<‘TKey,‘TValue> =

  new : unit -> Dictionary<‘TKey, ‘TValue> + 5 overloads

  mem­ber Add : key:‘TKey * value:‘TValue -> unit

  mem­ber Clear : unit -> unit

  mem­ber Com­parer : IEqualityComparer<‘TKey>

  mem­ber Con­tainsKey : key:‘TKey -> bool

  mem­ber Con­tainsValue : value:‘TValue -> bool

  mem­ber Count : int

  mem­ber GetEnu­mer­a­tor : unit -> Enumerator<‘TKey, ‘TValue>

  mem­ber GetO­b­ject­Data : info:SerializationInfo * context:StreamingContext -> unit

  mem­ber Item : ‘TKey -> ‘TValue with get, set

  …

  nested type Enumerator

  nested type KeyCollection

  nested type ValueCollection

Full name: System.Collections.Generic.Dictionary<_,_>

——————–

Dic­tio­nary() : unit

Dictionary(capacity: int) : unit

Dictionary(comparer: IEqualityComparer<‘TKey>) : unit

Dictionary(dictionary: IDictionary<‘TKey,‘TValue>) : unit

Dictionary(capacity: int, com­parer: IEqualityComparer<‘TKey>) : unit

Dictionary(dictionary: IDictionary<‘TKey,‘TValue>, com­parer: IEqualityComparer<‘TKey>) : unit

val sprintf : format:Printf.StringFormat<‘T> -> ‘T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf

val coun­tEle­mentsWork­flow : Workflow

Full name: ParallelizeActivities.countElementsWorkflow

Mul­ti­ple items

type Work­flow =

  inter­face IWorkflow

  new : domain:string * name:string * description:string * version:string * ?taskList:string * ?stages:Stage list * ?taskStartToCloseTimeout:Seconds * ?execStartToCloseTimeout:Seconds * ?childPolicy:ChildPolicy * ?identity:Identity * ?maxAttempts:int -> Workflow

  mem­ber pri­vate Append : toStageAction:(‘a -> Stage­Ac­tion) * args:‘a -> Workflow

  mem­ber Start : swfClt:AmazonSimpleWorkflowClient -> unit

  mem­ber add_OnActivityFailed : Handler<Domain * Name * Activ­i­tyId * Details option * Rea­son option> -> unit

  mem­ber add_OnActivityTaskError : Handler<Exception> -> unit

  mem­ber add_OnDecisionTaskError : Handler<Exception> -> unit

  mem­ber add_OnWorkflowCompleted : Handler<Domain * Name> -> unit

  mem­ber add_OnWorkflowFailed : Handler<Domain * Name * RunId * Details option * Rea­son option> -> unit

  mem­ber Num­berOf­Stages : int

  …

Full name: Amazon.SimpleWorkflow.Extensions.Workflow

——————–

new : domain:string * name:string * description:string * version:string * ?taskList:string * ?stages:Stage list * ?taskStartToCloseTimeout:Model.Seconds * ?execStartToCloseTimeout:Model.Seconds * ?childPolicy:Model.ChildPolicy * ?identity:Model.Identity * ?maxAttempts:int -> Workflow

val awsKey : string

Full name: ParallelizeActivities.awsKey

val awsSe­cret : string

Full name: ParallelizeActivities.awsSecret

val client : AmazonSimpleWorkflowClient

Full name: ParallelizeActivities.client

Mul­ti­ple items

type Ama­zon­Sim­ple­Work­flow­Client =

  inherit AmazonWebServiceClient

  new : unit -> Ama­zon­Sim­ple­Work­flow­Client + 11 overloads

  mem­ber Begin­Count­Closed­Work­flowEx­e­cu­tions : countClosedWorkflowExecutionsRequest:CountClosedWorkflowExecutionsRequest * callback:AsyncCallback * state:obj -> IAsyncResult

  mem­ber Begin­Coun­tOpen­Work­flowEx­e­cu­tions : countOpenWorkflowExecutionsRequest:CountOpenWorkflowExecutionsRequest * callback:AsyncCallback * state:obj -> IAsyncResult

  mem­ber Begin­Count­Pendin­gAc­tiv­i­ty­Tasks : countPendingActivityTasksRequest:CountPendingActivityTasksRequest * callback:AsyncCallback * state:obj -> IAsyncResult

  mem­ber Begin­Count­Pend­ingDe­ci­sion­Tasks : countPendingDecisionTasksRequest:CountPendingDecisionTasksRequest * callback:AsyncCallback * state:obj -> IAsyncResult

  mem­ber Begin­Dep­re­cate­Ac­tiv­i­ty­Type : deprecateActivityTypeRequest:DeprecateActivityTypeRequest * callback:AsyncCallback * state:obj -> IAsyncResult

  mem­ber Begin­Dep­re­cate­Do­main : deprecateDomainRequest:DeprecateDomainRequest * callback:AsyncCallback * state:obj -> IAsyncResult

  mem­ber Begin­Dep­re­cate­Work­flow­Type : deprecateWorkflowTypeRequest:DeprecateWorkflowTypeRequest * callback:AsyncCallback * state:obj -> IAsyncResult

  mem­ber Begin­De­scribeAc­tiv­i­ty­Type : describeActivityTypeRequest:DescribeActivityTypeRequest * callback:AsyncCallback * state:obj -> IAsyncResult

  mem­ber Begin­De­scribeDo­main : describeDomainRequest:DescribeDomainRequest * callback:AsyncCallback * state:obj -> IAsyncResult

  …

Full name: Amazon.SimpleWorkflow.AmazonSimpleWorkflowClient

——————–

Ama­zon­Sim­ple­Work­flow­Client() : unit

   (+0 other overloads)

AmazonSimpleWorkflowClient(region: Amazon.RegionEndpoint) : unit

   (+0 other overloads)

AmazonSimpleWorkflowClient(config: Ama­zon­Sim­ple­Work­flow­Con­fig) : unit

   (+0 other overloads)

AmazonSimpleWorkflowClient(credentials: Amazon.Runtime.AWSCredentials) : unit

   (+0 other overloads)

AmazonSimpleWorkflowClient(credentials: Amazon.Runtime.AWSCredentials, region: Amazon.RegionEndpoint) : unit

   (+0 other overloads)

AmazonSimpleWorkflowClient(credentials: Amazon.Runtime.AWSCredentials, client­Con­fig: Ama­zon­Sim­ple­Work­flow­Con­fig) : unit

   (+0 other overloads)

AmazonSimpleWorkflowClient(awsAccessKeyId: string, awsSec­re­tAc­cessKey: string) : unit

   (+0 other overloads)

AmazonSimpleWorkflowClient(awsAccessKeyId: string, awsSec­re­tAc­cessKey: string, region: Amazon.RegionEndpoint) : unit

   (+0 other overloads)

AmazonSimpleWorkflowClient(awsAccessKeyId: string, awsSec­re­tAc­cessKey: string, client­Con­fig: Ama­zon­Sim­ple­Work­flow­Con­fig) : unit

   (+0 other overloads)

AmazonSimpleWorkflowClient(awsAccessKeyId: string, awsSec­re­tAc­cessKey: string, awsSes­sion­To­ken: string) : unit

   (+0 other overloads)

mem­ber Workflow.Start : swfClt:AmazonSimpleWorkflowClient -> unit

Thanks to Tomas Pet­ricek’s FSharp.Formatting project I’m now able to pro­vide code snip­pets with intel­lisense! Tomas, you rock!

 

Run­ning the above exam­ple and start­ing a work­flow exe­cu­tion with the input

http://www.bing.com

out­puts the fol­low­ing to the console:

image

If you take a look at the his­tory of events below, the deci­sion task fol­low­ing the com­ple­tion of stage 0 (the first echo activ­ity) was com­pleted with the state (tracked in the Exe­cu­tion Con­text property):

{“CurrentStageNumber”:1,“NumberOfActions”:3,“Results”:{}}

With­out going into too much details on the inner work­ings of the gen­er­ated decider, this JSON seri­al­ized state tells us that the work­flow has moved into stage no. 1, where there are a total of 3 actions, each rep­re­sented by an activ­ity task to count a par­tic­u­lar type of HTML element.

image

Switch­ing to the Activ­i­ties tab, you can see that 3 activ­i­ties were com­pleted at stage index 1 of the count_html_elements work­flow, judg­ing by the Activ­ity ID and Ver­sion of the 3 activities:

image

Caveats

Look­ing at the exam­ple code, a cou­ple of ques­tions jump out straight away:

Q. What is the ISchedu­la­ble interface?

The ISchedu­la­ble inter­face rep­re­sents any­thing that can be sched­uled as a part of a work­flow, i.e. an activ­ity or a child work­flow. Both IAc­tiv­ity and IWork­flow inher­its from it though in all the exam­ples so far we’ve only worked directly against the con­cept imple­men­ta­tion types for these two interfaces.

 

Q. Why does the reducer take a Dictionary<int, string>?

A. As far as the reducer is con­cerned, it prob­a­bly doesn’t need to be. The main rea­son I’ve used a dic­tio­nary here is that when inter­me­di­ate results are avail­able (e.g. 2 out of 3 par­al­lel activ­i­ties have com­pleted) I wanted to be able to show the cur­rent set of results in the Exe­cu­tion Con­text for the work­flow exe­cu­tion (see screen­shot above). Because we don’t have all the results back, so I needed to be able to show the result against the orig­i­nat­ing cativ­ity, hence why a dic­tio­nary where the key is the zero-based index of the activ­ity in the input array and the value is the string rep­re­sen­ta­tion of the result.

 

Q. Why then, are the Dic­tio­nary’s value strings when the sched­uled activ­ity can be generic and return arbi­trary types?

Because not all the activ­i­ties have to return the same type.

Under the hood the generic Activity<TInput, TOut­put> mar­shals data to and from JSON strings using ServiceStack.Text JSON seri­al­izer, and the Activ­ity type is just a spe­cial case where both TIn­put and TOut­put are strings.

When the result is recorded and retrieved via SWF, they’re already in string for­mat, although it’s pos­si­ble to inspect the orig­i­nat­ing activity’s generic type para­me­ters to work out the returned type, to cater for dif­fer­ent return types, the dic­tio­nary would need to be a Dictionary<int, object> instead, which is not any better.

 

Q. So what if I want to return any­thing other than a string from the reducer function?

For now, you can use the ServiceStack.Text JSON seri­al­izer (for bet­ter com­pat­i­bil­ity) to seri­al­ize the return value to string your­self, I’ll add sup­port for the library to do this auto­mat­i­cally in ver­sion 1.1.0 release. It skipped my mind at the time, sorry…

 

Next

In the next post, I’ll show you how you can add child work­flows in the mix. As I’ve men­tioned above, work­flows also imple­ment the ISchedu­la­ble inter­face and can be sched­uled into the work­flow in the same way as activities.

To find out the lat­est announce­ments and updates on the Amazon.SimpleWorkflow.Extensions project, please fol­low the offi­cial twit­ter account @swf_extensions, and as always, your feed­backs and com­ments on the project will be much appreciated!

Share

The series so far:

1.   Hello World exam­ple

3.   Par­al­leliz­ing activities

 

In this post we’re going to go beyond the pre­vi­ous Hello World exam­ple and show you how to use the SWF exten­sions library to model work­flows with mul­ti­ple steps and allow data to flow nat­u­rally from one step to the next.

When using the exten­sion library, the input to a work­flow exe­cu­tion is passed onto the first activ­ity of a work­flow as input by default (as per the pre­vi­ous Hello World exam­ple), and the result of that activ­ity is passed onto the next activ­ity as input and so on. The result of the last activ­ity is then used as the result of the whole workflow :

image

When you model an activ­ity using the Activ­ity type you need to pass in a func­tion with the sig­na­ture of string –> string. This func­tion is called against the input when the gen­er­ated activ­ity worker receives a task, and will use the return value from the func­tion as the result of the activity.

What you might not real­ize is that the Activ­ity type is actu­ally a spe­cial­ized form of the generic Activity<TInput, TOut­put> type which allows you to sup­ply func­tions with arbi­trary input and out­put types and sim­ply uses the ServiceStack.Text JSON seri­al­izer to mar­shal data to and from string. I had decided to use the ServiceStack.Text seri­al­izer because it’s the fastest JSON seri­al­izer around based on my bench­marks.

Exam­ple : Sum Web Page Lengths

Sup­pose you want to count and sum the size of the HTML pages given a num­ber of URLs and return the sum as the result:

image

To imple­ment this work­flow you need to attach two activ­i­ties to the work­flow, the first requir­ing a func­tion that turns a string array into an int array and the sec­ond aggre­gates the int array into a sin­gle inte­ger value, some­thing along the lines of:

The main things to take away from this exam­ple is that:

  1. you can attach mul­ti­ple activ­i­ties to a work­flow by chain­ing them up with the ++> oper­a­tor
  2. han­dler func­tions for activ­i­ties do not have to have string –> string sig­na­ture

 

Let’s take a closer look at the two activities:

image

image

image

As you can see, given the input JSON string to the workflow:

[ “http://www.google.com”, “http://www.yahoo.com”, “http://www.bing.com” ]

the activ­i­ties did what they were sup­posed to and first trans­lated the input into an int array before sum­ming them to give a total for the length of the land­ing pages for Google, Yahoo and Bing, and lit­tle sur­prise that Yahoo’s land­ing page is nearly an order of mag­ni­tude big­ger than the rest!

 

Part­ing Thoughts

In this post I demon­strated how to model a work­flow with mul­ti­ple steps which accept and return arbi­trary types as input and out­put. In the next post I’ll demon­strate how to sched­ule mul­ti­ple activ­i­ties to be per­formed in par­al­lel as a sin­gle step of a workflow.

Share

Series so far:

2. Beyond Hello World

3. Par­al­leliz­ing activities

 

In my pre­vi­ous post I men­tioned some of the short­com­ings with Ama­zon Sim­ple­Work­flow (SWF) which drove me to cre­ate an exten­sion library on top of the stan­dard .Net SDK to make it eas­ier to model work­flows and busi­ness processes using SWF.

In this series of blog posts I’ll give you more exam­ples of how to use the library to model work­flows to be exe­cuted against the SWF ser­vice to take advan­tage of the reli­able state man­age­ment and task dis­patch it offers, but none of the plumb­ing and boil­er­plate code you would have to deal with using the SDK.

Before we start look­ing at exam­ples, let’s have a quick recap of the SWF terminologies:

  • A work­flow is a sequence of steps that are loosely strung together by the deci­sions the decider makes each time the state of the work­flow changes. E.g. step 1 com­plete then sched­ule step 2 to commence.
  • A work­flow exe­cu­tion is an instance of a par­tic­u­lar work­flow cur­rently being exe­cuted, many exe­cu­tions of the same work­flow (iden­ti­fied by name and ver­sion) can be in flight at the same time. A work­flow exe­cu­tion can be started with string as input and it can return string as output.
  • A deci­sion task is a task that is sched­uled each time a workflow’s state changes.
  • A decider is a com­po­nent in your appli­ca­tion which is respon­si­ble for polling SWF for deci­sion tasks and respond with deci­sions. The sequence of steps that need to be per­formed by the work­flow is ulti­mately deter­mined by the decider.
  • An activ­ity task is a task that is sched­uled by a decider, it takes a string as input (along with sev­eral other pieces of data which it can be sched­uled with) and returns a string as result.
  • An activ­ity worker is a com­po­nent in your appli­ca­tion which is respon­si­ble for polling SWF for activ­ity tasks and respond with com­ple­tion or fail­ure sig­nals, as well as pro­vid­ing reg­u­lar heart­beat sig­nals. If the decider is respon­si­ble for sched­ul­ing work to be done, then the activ­ity worker is respon­si­ble for doing the actual work.
  • A child work­flow is a work­flow that is sched­uled by the decider as a step in a work­flow, sim­i­lar to an activity.
  • The decider is able to sched­ule both child work­flows and activ­i­ties for a sin­gle step in a work­flow, whilst child work­flows can be rerun as an inde­pen­dent unit of work, activ­i­ties can­not be rerun inde­pen­dently out­side of the con­text of a workflow.
  • Both work­flows and activ­i­ties need to be reg­is­tered with the SWF ser­vice before they can be used.

These are the most com­mon concepts/components you’ll see in SWF, but there are also less com­monly used (in my opin­ion at least) fea­tures such as:

  • Start­ing a timer to cause a timer event to be fired after some time.
  • Sig­nalling an exter­nal work­flow exe­cu­tion to cause an event to be recorded in its exe­cu­tion his­tory and a deci­sion task to be sched­uled. This is a use­ful way to allow inter-workflow com­mu­ni­ca­tion, e.g. one work­flow sus­pends itself, until another work­flow sends it a sig­nal and then it can resume with its execution.
  • Record­ing a marker as means to pro­vide addi­tional infor­ma­tion in the exe­cu­tion his­tory of a workflow.

 

Exam­ple : Hello World

Con­sider a work­flow where there is only one activ­ity, which sim­ply prints the input to the screen and echoes it back out.

If we start a work­flow exe­cu­tion with the input “Hello World!” then we expect to see the input being printed to the con­sole and then the work­flow exe­cu­tion com­pleted with the result “Hello World!”.

image

In its essence, you can think of an activ­ity as noth­ing more than a func­tion which accepts a string as argu­ment and return a string, i.e. a fun with sig­na­ture string –> string in F#.

With the stan­dard .Net SDK you will need to write a decider for each work­flow in order to pro­vide the orches­tra­tion you need for that work­flow. The decider logic tends to quickly become dif­fi­cult to under­stand and main­tain when the deci­sion logic becomes more com­pli­cated, e.g. when mul­ti­ple activ­i­ties and child work­flows are sched­uled in par­al­lel ‚and you need to retry/fail activities/workflows, etc.

In my view, the decider is largely plumb­ing that devel­op­ers should do with­out, so with the exten­sions library you should not need to write any cus­tom decider code but instead, sim­ply declare what activ­i­ties and/or child work­flows should be sched­uled at each stage of a work­flow and let the library do all the heavy lift­ing for you!

As far as work­flow mod­el­ling is con­cerned, the only thing you need to do is use the cus­tom ++> oper­a­tor (inspired by Dave Thomas’s pipelets project) to attach addi­tional steps to your work­flow. So the above work­flow can be mod­elled as:

and that’s it! No need to reg­is­ter the work­flow and activ­ity and write bespoke decider & activ­ity worker your­self, the library does all of that for you, all you needed to do was to model the work­flow you want.

Notice you haven’t had to pro­vide any ref­er­ence to SWF at all thus far, in fact, you only need to pro­vide an instance of Ama­zon­Sim­ple­Work­flow­Client (from the AWS SDK) when you start the workflow:

image

This way, it’s pos­si­ble to run the work­flow across mul­ti­ple accounts simul­ta­ne­ously (dev, stag­ing, prod, etc.) by call­ing the Start method with each of the client instances (one for each account), which fits well with the mobile worker model SWF is designed with – SWF holds the state but you can run your work­ers from any­where in and out of the AWS ecosystem.

Once you’ve started the work­flow, the library will auto­mat­i­cally reg­is­ter the domain, work­flow and activ­ity for you if they are not present already. You can ver­ify this by look­ing in the SWF Man­age­ment Con­sole:

image

Notice that whilst we didn’t spec­ify the “echo” activ­ity with a ver­sion num­ber, it’s reg­is­tered with “echo.0”? I’ll go into more details on the ver­sion­ing scheme in a later post, but for now let’s just be glad that we didn’t have to reg­is­ter these by hand!

Next, you can start a work­flow directly from the man­age­ment con­sole, but tick­ing against the work­flow you want to start and click­ing the “Start New Exe­cu­tion” but­ton:

image

Let’s fol­low through with the dia­logue box and set the input as Hello World! as below:

image image

Once you start the work­flow exe­cu­tion you will see Hello World! being printed in the console:

image

This is a sign that our echo func­tion (which is invoked by the gen­er­ated activ­ity worker) had been called.

Back in the SWF Man­age­ment Con­sole, if you look under Work­flow Exe­cu­tions, you should see the exe­cu­tion is closed after hav­ing com­pleted successfully:

image

Click­ing on the work­flow exe­cu­tion ID allows you to see the sequence of events which had been recorded for this execution:

image

This is a very gran­u­lar view of what hap­pened dur­ing the work­flow exe­cu­tion, giv­ing you plenty of use­ful infor­ma­tion if you ever need to inves­ti­gate why a work­flow exe­cu­tion failed, for instance.

If you switch to the Activ­i­ties tab, you’ll get a more con­densed view with just the activ­i­ties that were sched­uled, along with their inputs, results, etc.

image

For now, ignore the JSON string in the Con­trol field and the for­mat of the Activ­ity ID, these are both auto­mat­i­cally gen­er­ated by the library based on a set of con­ven­tions and will be cov­ered by a later post.

So that’s it! I hope you can see that this exten­sion library gives you a pow­er­ful way to express and model a work­flow and focus your devel­op­ment efforts on the things that count (design­ing the process and writ­ing the code that does the actual work) rather than wast­ing pre­cious devel­oper time on get­ting your code to work with SWF!

 

Part­ing Thoughts

For Java devel­op­ers, there is an exist­ing high-level frame­work (pro­vided by Ama­zon itself) for work­ing with SWF called the Flow Frame­work, which adapts a more object-oriented approach and in my opin­ion requires far more plumb­ing and most impor­tantly does not

In case you’re won­der­ing, this is how a solu­tion to a sim­i­lar Hello World exam­ple looks using the flow frame­work (taken straight from the flow frame­work devel­oper guide) for your comparison:

Share