Thursday, 10 September 2015

Workflow For Dataflow?

Adrian Colyer is writing some really insightful commentaries on research papers on distributed computing at  The latest post Go with the flow is part of the series Out of the Fire Swamp. It triggered me to share an idea I had some time ago: Using workflow to coordinate microservices as a replacement for transactions.

When you're implementing a request of let's say a web API, amongst read operations, you also may perform some updates to non transactional resources such as a NoSQL datastore or another microservice.

What you typically want to provide to the client of your request is all or nothing semantics.  So when your server crashes in the middle of your request, you may have performed some, but not all of the updates to your non transactional resources.  This can lead to inconsistencies in your domain model.

In a way, a workflow is a persistent execution flow.  It stores the state of the execution so it can be resumed later.  Typical workflow engines probably don't do this efficient enough for the use case I'm discussing here.  But I believe a workflow engine can be tuned for this use case of consistency.

Imagine a request to create an invoice that performs 4 request to other services in a flow like this:

The new request implementation can first perform some reads and than replace the four individual updates with a single update that creates the workflow instance.  The workflow engine can ensure that the workflow instance is persisted in a single atomic update.  Afterwards, the workflow engine ensures that progress is persisted before and after each activity.  In case of crashes, the workflow engine can then resume in case of crashes based on the persisted execution.

The actions performed should be idempotent because it's typically not possible to guarantee exactly once execution semantics in a distributed system.

This workflow would add the guarantee that once the workflow is started, you know that at some point it will be completed.  Typically it's not necessary to block the request till the whole workflow is completed although this is technically possible.

This idea really looks close to the event store as described in Adrian's blog.  I wonder if this could be relevant a piece in tomorrow's distributed computing puzzle.