Durable Functions: Fan Out Fan In Patterns

0
14

This post is a collaboration between myself and my awesome coworker, Maxime Rouiller.

Durable Functions? Wat. If you’re new to Durable, I suggest you start here with this post that covers all the essentials so that you can properly dive in. In this post, we’re going to dive into one particular use case so that you can see a Durable Function pattern at work!

Today, let’s talk about the Fan Out, Fan In pattern. We’ll do so by retrieving an open issue count from GitHub and then storing what we get. Here’s the repo where all the code lives that we’ll walk through in this post.

View Repo

About the Fan Out/Fan In Pattern

We briefly mentioned this pattern in the previous article, so let’s review. You’d likely reach for this pattern when you need to execute multiple functions in parallel and then perform some other task with those results. You can imagine that this pattern is useful for quite a lot of projects, because it’s pretty often that we have to do one thing based on data from a few other sources.

For example, let’s say you are a takeout restaurant with a ton of orders coming through. You might use this pattern to first get the order, then use that order to figure out prices for all the items, the availability of those items, and see if any of them have any sales or deals. Perhaps the sales/deals are not hosted in the same place as your prices because they are controlled by an outside sales firm. You might also need to find out what your delivery queue is like and who on your staff should get it based on their location.

That’s a lot of coordination! But you’d need to then aggregate all of that information to complete the order and process it. This is a simplified, contrived example of course, but you can see how useful it is to work on a few things concurrently so that they can then be used by one final function.

Here’s what that looks like, in abstract code and visualization

See the Pen Durable Functions: Pattern #2, Fan Out, Fan In by Sarah Drasner (@sdras) on CodePen.

const df = require(‘durable-functions’)

module.exports = df(function*(ctx) {
const tasks = []

// items to process concurrently, added to an array
const taskItems = yield ctx.df.callActivityAsync(‘fn1’)
taskItems.forEach(item => tasks.push(ctx.df.callActivityAsync(‘fn2’, item))
yield ctx.df.task.all(tasks)

// send results to last function for processing
yield ctx.df.callActivityAsync(‘fn3’, tasks)
})

Now that we see why we would want to use this pattern, let’s dive in to a simplified example that explains how.

Setting up your environment to work with Durable Functions

First things first. We’ve got to get development environment ready to work with Durable Functions. Let’s break that down.

GitHub Personal Access Token

To run this sample, you’ll need to create a personal access token in GitHub. If you go under your account photo, open the dropdown, and select Settings, then Developer settings in the left sidebar. In the same sidebar on the next screen, click Personal access tokens option.

Then a prompt will come up and you can click the Generate new token button. You should give your token a name that makes sense for this project. Like “Durable functions are better than burritos.” You know, something standard like that.

For the scopes/permission option, I suggest selecting “repos” which then allows to click the Generate token button and copy the token to your clipboard. Please keep in mind that you should never commit your token. (It will be revoked if you do. Ask me why I know that.) If you need more info on creating tokens, there are further instructions here.

Functions CLI

First, we’ll install the latest version of the Azure Functions CLI. We can do so by running this in our terminal:

npm i -g azure-functions-core-tools@core –unsafe-perm true

Does the unsafe perm flag freak you out? It did for me as well. Really what it’s doing is preventing UID/GID switching when package scripts run, which is necessary because the package itself is a JavaScript wrapper around .NET. Brew installing without such a flag is also available and more information about that is here.

Optional: Setting up the project in VS Code

Totally not necessary, but I like working in VS Code with Azure Functions because it has great local debugging, which is typically a pain with Serverless functions. If you haven’t already installed it, you can do so here:

  • Visual Studio Code
  • Azure functions Extension

Set up a Free Trial for Azure and Create a Storage Account

To run this sample, you’ll need to test drive a free trial for Azure. You can go into the portal and sign in the lefthand corner. You’ll make a new Blob Storage account, and retrieve the keys. Since we have that all squared away, we’re ready to rock!

Setting up Our Durable Function

Let’s take a look at the repo we have set up. We’ll clone or fork it:

git clone https://github.com/Azure-Samples/durablefunctions-apiscraping-nodejs.git

Here’s what that initial file structure is like.

(This visualization was made from my CLI tool.)

In local.settings.json, change GitHubToken to the value you grabbed from GitHub earlier, and do the same for the two storage keys — paste in the keys from the storage account you set up earlier.

Then run:

func extensions install
npm i
func host start

And now we’re running locally!

Understanding the Orchestrator

As you can see, we have a number of folders within the FanOutFanInCrawler directory. The functions in the directories listed GetAllRepositoriesForOrganization, GetAllOpenedIssues, and SaveRepositories are the functions that we will be coordinating.

Here’s what we’ll be doing:

  • The Orchestrator will kick off the GetAllRepositoriesForOrganization function, where we’ll pass in the organization name, retrieved from getInput() from the Orchestrator_HttpStart function
  • Since this is likely to be more than one repo, we’ll first create an empty array, then loop through all of the repos and run GetOpenedIssues, and push those onto the array. What we’re running here will all fire concurrently because it isn’t within the yield in the iterator
  • Then we’ll wait for all of the tasks to finish executing and finally call SaveRepositories which will store all of the results in Blob Storage

Since the other functions are fairly standard, let’s dig into that Orchestrator for a minute. If we look inside the Orchestrator directory, we can see it has a fairly traditional setup for a function with index.js and function.json files.

Generators

Before we dive into the Orchestrator, let’s take a very brief side tour into generators, because you won’t be able to understand the rest of the code without them.

A generator is not the only way to write this code! It could be accomplished with other asynchronous JavaScript patterns as well. It just so happens that this is a pretty clean and legible way to write it, so let’s look at it really fast.

function* generator(i) {
yield i++;
yield i++;
yield i++;
}

var gen = generator(1);

console.log(gen.next().value); // 1
console.log(gen.next().value); // 2
console.log(gen.next().value); // 3
console.log(gen.next()); // {value: undefined, done: true}

After the initial little asterisk following function*, you can begin to use the yield keyword. Calling a generator function does not execute the whole function in its entirety; an iterator object is returned instead. The next() method will walk over them one by one, and we’ll be given an object that tells us both the value and done — which will be a boolean of whether we’re done walking through all of the yield statements. You can see in the example above that for the last .next() call, an object is returned where done is true, letting us know we’ve iterated through all values.

Orchestrator code

We’ll start with the require statement we’ll need for this to work:

const df = require(‘durable-functions’)

module.exports = df(function*(context) {
// our orchestrator code will go here
})

It’s worth noting that the asterisk there will create an iterator function.

First, we’ll get the organization name from the Orchestrator_HttpStart function and get all the repos for that organization with GetAllRepositoriesForOrganization. Note we use yield within the repositories assignment to make the function perform in sequential order.

const df = require(‘durable-functions’)

module.exports = df(function*(context) {
var organizationName = context.df.getInput()
var repositories = yield context.df.callActivityAsync(
‘GetAllRepositoriesForOrganization’,
organizationName
)
})

Then we’re going to create an empty array named output, create a for loop from the array we got containing all of the organization’s repos, and use that to push the issues into the array. Note that we don’t use yield here so that they’re all running concurrently instead of waiting one after another.

const df = require(‘durable-functions’)

module.exports = df(function*(context) {
var organizationName = context.df.getInput()
var repositories = yield context.df.callActivityAsync(
‘GetAllRepositoriesForOrganization’,
organizationName
)

var output = []
for (var i = 0; i < repositories.length; i++) {
output.push(
context.df.callActivityAsync(‘GetOpenedIssues’, repositories[i])
)
}

})

Finally, when all of these executions are done, we’re going to store the results and pass that in to the SaveRepositories function, which will save them to Blob Storage. Then we’ll return the unique ID of the instance (context.instanceId).

const df = require(‘durable-functions’)

module.exports = df(function*(context) {
var organizationName = context.df.getInput()
var repositories = yield context.df.callActivityAsync(
‘GetAllRepositoriesForOrganization’,
organizationName
)

var output = []
for (var i = 0; i < repositories.length; i++) {
output.push(
context.df.callActivityAsync(‘GetOpenedIssues’, repositories[i])
)
}

const results = yield context.df.Task.all(output)
yield context.df.callActivityAsync(‘SaveRepositories’, results)

return context.instanceId
})

Now we’ve got all the steps we need to manage all of our functions with this single orchestrator!

Deploy

Now the fun part. Let’s deploy! 🚀

To deploy components, Azure requires you to install the Azure CLI and login with it.

First, you will need to provision the service. Look into the provision.ps1 file that’s provided to familiarize yourself with the resources we are going to create. Then, you can execute the file with the previously generated GitHub token like this:

.provision.ps1 -githubToken <TOKEN> -resourceGroup <ResourceGroupName> -storageName <StorageAccountName> -functionName <FunctionName>

If you don’t want to install PowerShell, you can also take the commands within provision.ps1 and run it manually.

And there we have it! Our Durable Function is up and running.