Flow-based Programming

Panta rhei (Panta rhei) - Everything flows.

FBP vs. FBP-inspired Systems

Some readers may have arrived at Flow-Based Programming (FBP) by way of NoFlo, which is a JavaScript-based system motivated by my book Flow-based Programming, and which uses a number of the same terms and concepts. This project, started by Henri Bergius in 2012, now called Flowhub, implements a number of the concepts of FBP, and has been creating a significant buzz world-wide since then. In the fall of 2016, Flowhub and other flow-based programming assets were purchased from The Grid, and are now up and running as Flowhub UG, a company registered in Berlin, Germany.

While a lot of credit is due the NoFlo team for bringing FBP to the attention of the computer world, the NoFlo implementation of FBP in fact differs in a number of respects from "classical" FBP as it has evolved over the last 50+ years, and in fact operates on a completely different paradigm, although it shares with "classical" FBP a number of technical and philosophical ideas. In "classical" FBP, application development can be likened to designing a data processing "factory", which fundamentally changes the way designers think about system implementation; NoFlo on the other hand is still basically synchronous and procedural, although it does implement the componentry and "configurable modularity" features of FBP, in combination with a visual representation. We are now seeing a proliferation of such systems, including Node-RED from IBM, in a variety of different programming languages, although JavaScript seems to be the most popular currently (at least in this particular arena). Because of their synchronous, procedural nature, the latter type of systems have a lower level of granularity, supporting finer-grained components, and smaller units of data. We should however point out that, unlike these systems, "classical" FBP really constitutes a rather fundamental paradigm shift, which could be wrenching for programmers trained in conventional von Neumann thinking! On the other hand, it provides a "consistent application view, from maxi to mini", as the late American software engineer, Wayne Stevens, put it, and is compatible with many design techniques, and other "flow" technologies. Because of the proliferation of synchronous packages, we will use the terms "FBP-like" or "FBP-inspired" (as suggested by Joe Witt of Cloudera) when it is necessary to distinguish between them and "classical" FBP.

Ali Razeen, at Duke University, has pointed out in a 2015 note that such "FBP-like" implementations should not be viewed as true FBP implementations, as they are missing some key characteristics of true FBP - mainly, asynchronism, information packets with unique ownership and lifetime, and "reverse pressure" - and so typically miss out on the critical paradigm shift and a number of its attendant benefits.

John Cowan in the Google Group on FBP says the following:

The easiest way to understand the difference is to take the point of view of the component programmer.

In classical FBP, all components are autonomous. They can read from any of their input ports whenever they want to, and can write to any of their output ports whenever they want to. If the input port is empty or the output port full, the component waits transparently until things change. This is a model familiar to all programmers, because it is exactly how files are processed. A program does not sit in an event loop waiting for the OS to push the next block of a file at it, or send an event that says "Write your next block to the output file now".

In FBP-inspired systems, typically any component can write to a port whenever it wants, but components only have a single input port and sit waiting for a packet to be pushed to them. This behavior allows a single-threaded implementation of the whole system, but each component is controlled by its upstream partner(s) rather than being autonomous. This pattern has become very familiar to GUI programmers, whose programs typically are event loops because they need to be responsive to unpredictable user actions, but the program has to be turned inside out (an "inversion of control") which makes certain natural techniques like recursion difficult or impossible.

It should also be pointed out that FBP-like systems usually allow one output port to be connected to multiple input ports - this does not make any sense in a classical FBP system as it would be like being able to send the same package to several of your friends all at the same time!

NoFlo is based on Node.js and is written in JavaScript and CoffeeScript. These languages basically support a single-threaded implementation, although they can achieve some asynchronism by the use of "callbacks". Although NoFlo and its relatives can simulate asynchronism to some extent, only one thing is happening at a time, and they are limited to using only a single processor. While it is very understandable that people will assume that adding configurable modularity, componentry and visual design onto conventional programming should result in a powerful combination, while not getting too far away from the conventional programming that they are used to, as I said above, they do not implement the FBP paradigm shift, so they miss out on a lot of the power of "classical" FBP.

Most of the rest of this article will be concerned with "classical" FBP. We will also be using the term "von Neumann paradigm" from time to time. For those unfamiliar with the term, it refers to a computer design where a single instruction counter walks through a program accessing a uniform array of non-destructive-readout memory cells. This has in fact been the standard computer architecture for several decades, but people are increasingly finding it inadequate for today's challenges, as shown by frequent cost and schedule overruns, weird bugs, and difficulty maintaining large applications. More and more writers have started to point out that these problems derive in large part from the architecture itself. Unfortunately programmers are exposed to this approach from the very start, and have a great deal of difficulty breaking loose from it! Ken Kan has pointed out this quote from Edsger Dijkstra (thanks, Ken!):

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

With all due respect to Dijkstra, it's not just BASIC! There is a basic problem with the von Neumann paradigm itself, but, because we have been taught since the '50s that you can do anything with it, this paradigm is very seldom questioned. I have frequently detected a certain degree of nervousness on the part of many programmers encountering FBP for the first time, at not being able to control the exact timing of every event in a running application! This is in part due to the very sensitive nature of the von Neumann storage model, and the fact that it confuses data with its storage medium.

I have been wrestling with how best to convey the difference between the old "von Neumann" storage mental model and that of FBP, and I am starting to think that the description in Chap. 3 of the book, "Flow-Based Programming", says it best. Since it is a little long for this essay, I am just going to include a paragraph from that chapter, and invite the reader to click on Chap. 3 - Concepts online, or look it up in their copy of the book, to read the rest of the chapter.

... most of today’s computers have a uniform array of pigeon-holes for storage, and this storage behaves very differently from the way storage systems behave in real life. In real life, paper put into a drawer remains there until deliberately removed. It also takes up space, so that the drawer will eventually fill up, preventing more paper from being added. Compare this with the computer concept of storage – you can reach into a storage slot any number of times and get the same data each time (without being told that you have done it already), or you can put a piece of data in on top of a previous one, and the earlier one just disappears.... Although destructive storage is not integral to the the von Neumann machine, it is assumed in many functions of the machine, and this is the kind of storage which is provided on most modern computers. Since the storage of these machines is so sensitive to timing and because the sequencing of every instruction has to be predefined (and humans make mistakes!), it is incredibly difficult to get a program above a certain complexity to work properly. And of course this storage paradigm has been enshrined in most of our higher level languages in the concept of a “variable”. In a celebrated article John Backus (1978) actually apologized for inventing FORTRAN! That’s what I meant earlier about the strange use of the equals sign in Higher Level Languages. To a logician the statement J = J + 1 is a contradiction (unless J is infinity?) – yet programmers no longer notice anything strange about it!

We sometimes refer to FBP as a "new/old" paradigm, because in fact its approach and methodology has parallels with Unit Record systems, which were used for the first data processing applications and were highly asynchronous and component-oriented. When these applications started being replaced by computers, which seemed so much more powerful, a lot of useful concepts were lost... which FBP is now reintroducing.

An application built using FBP may be thought of as a "data processing factory": a network of independent "machines", communicating by means of conveyor belts, across which travel structured chunks of data, which are modified by successive "machines" until they are output to files or discarded. The various "machines" run in parallel, or interleaved, as determined by the number of processors in the machine. It should be pointed out that this same image can be applied to networks of computers or other devices - Wayne Stevens pointed out that FBP provides a "consistent application view" from "maxi" to "mini". Granted each FBP process is a von Neumann program, but it runs independently of all other processes, and so tends to be quite simple internally. Almost all of the data that an FBP process deals with is held in "information packets" (IPs) or in method-local storage. Unlike in conventional programming, the programmer does not have to worry about controlling the exact sequence of events - all s/he needs to concentrate on is the transformations that apply to the data to convert the original inputs to the desired output.

More importantly, the ways data is viewed in FBP vs. conventional programming (as well as many FBP-inspired systems) are completely different: in FBP, data is managed in packets (IPs), which have a well-defined lifetime, from creation to destruction, and can only be owned by one process at a time, or be in transit between processes - just like real-life objects. In conventional programming, data does not have a well-defined lifetime or clear ownership, as the data is confused with its storage medium. This, in combination with the single-threaded restriction, leads to many of the weird bugs that bedevil today's complex systems, as they are so sensitive to the exact timing of events that a minor timing error can have catastrophic results!

FBP supports data processing applications (business or scientific), typically long-running and high volume, and, as we have shown, involves a way of thinking (the new "paradigm") that is fundamentally different from that of conventional programming. This paradigm is actually more similar to engineering than to conventional programming, and, not surprisingly, involves a period of what might be called "apprenticeship", during which the practitioner is getting comfortable using its concepts. Conventional programming, by comparison, is as if you gave an engineer a bunch of blueprints and some girders, and told him or her to go build a bridge! It's not surprising that so many systems built using conventional technologies in recent years have suffered from cost overruns, logic glitches, etc., etc., and the problem is getting worse!

While data-oriented models have been used for application design for a number of years, up until now there was no easy way of converting these designs into running programs. Programmers could indeed design systems using data-oriented thinking, but then had to laboriously convert these designs into procedural code. In comparison, FBP provides a seamless transition from design to implementation, and our experience with it shows that it results in more maintainable and in fact better performing systems. It also facilitates communication between designers, programmers, maintenance staff and users. One large program written using an early ("green thread") implementation of FBP had been running in production for almost 40 years (as of the beginning of 2014), processing millions of transactions a night, while undergoing continuous maintenance during all that time, often by people who weren't even born when it was written!

While an FBP process is a "black box" component with its own internal environment and control thread, a NoFlo process is essentially a cloud of callbacks linked by instance variables. By comparison, the FBP mental model of a single process is much simpler - indeed, very similar to that of conventional programming - as basically each process has a single high-level method, which can then call subroutines in the normal way, as each process has its own independent call stack. There is then no confusion between the method's local storage and the process object's instance variables. Henri Bergius was able to simulate many FBP-inspired characteristics on the Node.js infrastructure, but some rather basic, and necessary, FBP techniques have no obvious counterpart in NoFlo. For instance, basic FBP business functions such as "Collate" require a process to be specific about which port it wants to receive from, and to be able to suspend until data arrives at that port - this function, or something similar, is being introduced gradually into NoFlo, but it logically requires a related architectural concept, missing from NoFlo, called "back pressure", where an upstream process will be suspended if the connection it feeds into becomes full.

In a major divergence from classical FBP, mentioned above, NoFlo lacks the concept of information packet (IP) "lifetimes", by which an IP is tracked from creation to destruction and can only be "owned" by a single process at a time, or be in transit between processes. Conventional programming (and NoFlo) confuses the data as "object" with the "location" of the data. This in fact is the reason so many subtle bugs show up in conventional programs. This also explains the fact that NoFlo allows a single output port to connect to multiple input ports, implying automatic replication of data. If data is seen as an "object" with a well-defined lifetime, this makes very little sense, as if you could have a single soft-drink bottle pass through two different machines at the same time!

Conversely if your view of data is not as an object, you will see nothing wrong with this image. Here is a description from Henri Bergius on how the basic send/receive linkage works in NoFlo:

The actual sending is a normal JavaScript event that triggers the connected inport's callback function. The inport puts the new IP into its buffer and notifies component, again via a callback

and with regard to "back pressure":

Right now the NoFlo buffers are only limited by system memory.

Adding limits and backpressure is certainly something to consider down the line. Hasn't really been a consideration for things NoFlo is usually used for, though.

Back pressure is the only way I am aware of that allows "infinite" amounts of data to be processed using finite resources! The NoFlo team tells me that they have been making changes to NoFlo to bring it closer to FBP, so we shall see what the future brings.

While NoFlo is appealing because of its ability to support both client- and server-side processing, thanks to Node.js, and because of JavaScript's close integration with HTML, it is still tied closely to "von Neumann thinking". All existing implementations of FBP on my GitHub repository can take advantage of multiple cores, with the exception of JSFBP, which has since been archived - see below. Because of JavaScript's restriction to a single core, neither NoFlo nor JSFBP support CPU-intensive applications, which are in fact well supported by JavaFBP, C#FBP, and C++FBP using Boost. It should be pointed out that the first FBP implementation used "green threads" with multiple stacks, so it used the same programming style as we do in today's FBP implementations. The underlying OS of that early implementation also supported asynchronous I/O, so, although we only had a single processor, performance was excellent - in fact run time was often better than with conventional programming, because, if a single process was suspended because of I/O, the whole job step did not have to be suspended, as is the case in conventional programming.

Because the NoFlo people use the term FBP so prominently when talking about NoFlo, we will also often prefix "FBP" with the term "classical" when it is necessary to distinguish it from NoFlo and other FBP-inspired frameworks. A number of the latter are starting to appear, such as IBM's recent Node-RED, but, like NoFlo, these systems are different in important ways from classical FBP, based as they are on von Neumann thinking. There is clearly common ground, but our experience shows that it is the FBP paradigm change that offers the most leverage for improved productivity and maintainability in application development. Ken Kan, who has several years' experience with NoFlo, says:

It is too easy to just make FBP work for JS, but what we really want to do is make JS work for FBP!

An FBP implementation written in JavaScript, called JSFBP, based on node-fibers, a package developed by Marcel Laverdet, and therefore in turn on Node.js, was developed, but has since been archived as node-fibers is no longer supported.

Recently, I have been working on a new implementation using the Go language - this is appealing because it has the built-in mechanism of "goroutines", which supports FBP in a natural way. This implementation, called GoFBP, seems to have pretty much stabilized, at least as far as its API is concerned. The internals may continue to change for a little while, so people wanting to use it should always download the latest version.

In conclusion, I thought I would compare one commonly used component in classical FBP against the same function written in NoFlo. The result is in "Concat" Component.

For those wishing to gain experience with FBP, there is no substitute for reading the book (Flow-based Programming, 2nd edition), and then starting to use one of the FBP implementations such as JavaFBP, C#FBP or JSFBP, or even the C++/Boost implementation currently under development, as described on the FBP web site. JavaFBP has the advantage of being closely integrated with a powerful diagramming tool, called DrawFBP, although DrawFBP can support any data flow language - and indeed can support high-level, language-independent, design as well.

For the time being, users wishing to work with FBP can code up networks using JavaFBP, C#FBP or CppFBP by hand, or JSFBP. Alternatively, they can use the DrawFBP drawing tool, written using Java Swing, which is also quite general, and can in fact generate networks for JavaFBP and C#FBP, as well as the .fbp notation used by NoFlo and CppFBP, plus NoFlo JSON networks. If JavaFBP is chosen, DrawFBP can load any chosen components, display its description and ports, and even check whether all required ports are connected.

While DrawFBP does not support run-time network execution, except in the case of JavaFBP, the networks it generates are complete programs. Its diagrams are stored in XML format, and additional generators can be added easily, or users can build their own generators using the XML format as input. DrawFBP also has the capability of carving out a piece of a network and converting it into a subnet.

FBP and multithreading

I recently wrote an article trying to describe the relationship between multithreading in programming languages and FBP, which clarified my thinking to some extent - see FBP and multithreading. Feel free to disagree, however! BTW In this article, I use the term "real" FBP, rather than "classical" FBP - same thing, though!

FBP and OO

For a discussion of the differences and similarities between FBP and OO, see Comparison between FBP and Object-Oriented Programming (Chapter 25 of the 2nd edition).