Flow-based Programming

Panta rhei (Panta rhei) - Everything flows.

FBP vs. FBP-inspired Systems

Some readers may have arrived at Flow-Based Programming (FBP) by way of NoFlo, which is a JavaScript-based system motivated by my book Flow-based Programming, and which uses a number of the same terms and concepts. This project, started by Henri Bergius in 2012, now called Flowhub, implements a number of the FBP concepts, and has been creating a significant buzz world-wide since then. In the fall of 2016, Flowhub and other flow-based programming assets were purchased from The Grid, and are now up and running as Flowhub UG, a company registered in Berlin, Germany.

While a lot of credit is due the NoFlo team for bringing FBP to the attention of the computer world, what the developers of NoFlo call "FBP" in fact differs in a number of respects from FBP as it has evolved over the last 40+ years, and in fact operates on a completely different paradigm. While NoFlo shares with FBP a number of technical and philosophical ideas, NoFlo is much more similar to what we now call "conventional" programming - procedural, algorithmic, one-action-at-a-time - and does not truly embody the "FBP paradigm shift", in which application development can be likened to designing a data processing "factory". The latter is a very different way of looking at application development.

Ali Razeen, at Duke University, has pointed out in an insightful 2015 note that a number of people have now built software which has the componentry and "configurable modularity" features of FBP, usually in combination with some visual representation, and assume they have built an FBP implementation. He then goes on to say that these should not be viewed as true FBP implementations, as they are missing some key characteristics of true FBP - mainly, asynchronism and information packets with unique ownership and lifetime - and so typically miss out on the critical paradigm shift... and a number of its attendant benefits. NoFlo and Node-RED are examples of this type of system. Because of the proliferation of such packages, we will use the term "FBP-inspired" (as suggested by Joe Witt of HortonWorks) when it is necessary to distinguish between them and FBP proper. You may also see the phrase "classical FBP" showing up from time to time, particularly in discussions with proponents of "FBP-inspired" systems.

John Cowan in the Google Group on FBP says the following:

The easiest way to understand the difference is to take the point of view of the component programmer.

In classical FBP, all components are autonomous. They can read from any of their input ports whenever they want to, and can write to any of their output ports whenever they want to. If the input port is empty or the output port full, the component waits transparently until things change. This is a model familiar to all programmers, because it is exactly how files are processed. A program does not sit in an event loop waiting for the OS to push the next block of a file at it, or send an event that says "Write your next block to the output file now".

In FBP-inspired systems, typically any component can write to a port whenever it wants, but components only have a single input port and sit waiting for a packet to be pushed to them. This behavior allows a single-threaded implementation of the whole system, but each component is controlled by its upstream partner(s) rather than being autonomous. This pattern has become very familiar to GUI programmers, whose programs typically are event loops because they need to be responsive to unpredictable user actions, but the program has to be turned inside out (an "inversion of control") which makes certain natural techniques like recursion difficult or impossible.

It should also be pointed out that FBP-inspired systems usually allow one output port to be connected to multiple input ports - this does not make any sense in a classical FBP system as it would be like being able to send the same package to several of your friends all at the same time!

NoFlo is based on Node.js and is written in JavaScript and CoffeeScript. These languages basically support a single-threaded implementation, although they can achieve some asynchronism by the use of "callbacks". Although NoFlo and its relatives can simulate asynchronism to some extent, only one thing is happening at a time, and they are limited to using only a single processor. While it is very understandable that people will assume that adding configurable modularity, componentry and visual design onto conventional programming should result in a powerful combination, while not getting too far away from the conventional programming that they are used to, it is my feeling, backed up by several decades of experience, that this does not really result in an improved developer experience or more maintainable systems.

We will be using the term "von Neumann paradigm" from time to time. For those unfamiliar with the term, it refers to a computer design where a single instruction counter walks through a program accessing a uniform array of non-destructive-readout memory cells. This has in fact been the standard computer architecture for several decades, but people are increasingly finding it inadequate for today's challenges, as shown by frequent cost and schedule overruns, weird bugs, and difficulty maintaining large applications. More and more writers have started to point out that these problems derive in large part from the architecture itself. Unfortunately programmers are exposed to this approach from the very start, and have a great deal of difficulty breaking loose from it! Ken Kan has pointed out this quote from Edsger Dijkstra (thanks, Ken!):

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

With all due respect to Dijkstra, it's not just BASIC! There is a basic problem with the von Neumann paradigm itself, but, because we have been taught since the '50s that you can do anything with it, this paradigm is very seldom questioned. I have frequently detected a certain degree of nervousness on the part of many programmers encountering FBP for the first time, at not being able to control the exact timing of every event in a running application! This is in part due to the very sensitive nature of the von Neumann storage model, and the fact that it confuses data with its storage medium.

I have been wrestling with how best to convey the difference between the old "von Neumann" storage mental model and that of Flow-Based Programming, and I am starting to think that the description in Chap. 3 of the book, "Flow-Based Programming", says it best. Since it is a little long for this essay, I would ask the reader to click on Chap. 3 - Concepts online, or look it up in their copy of the book - do a find on Fig. 3.1, and continue from there.

We sometimes refer to FBP as a "new/old" paradigm, because in fact its approach and methodology has parallels with Unit Record systems, which were used for the first data processing applications and were highly asynchronous and component-oriented. When these applications started being replaced by computers, which seemed so much more powerful, a lot of useful concepts were lost... which FBP is now reintroducing.

An application built using FBP may be thought of as a "data processing factory": a network of independent "machines", communicating by means of conveyor belts, across which travel structured chunks of data, which are modified by successive "machines" until they are output to files or discarded. The various "machines" run in parallel, or interleaved, as determined by the number of processors in the machine. It should be pointed out that this same image can be applied to networks of computers or other devices - Wayne Stevens pointed out that FBP provides a "consistent application view" from "maxi" to "mini". Granted each FBP process is a von Neumann program, but it runs independently of all other processes, and so tends to be quite simple internally. Almost all of the data that an FBP process deals with is held in "information packets" (IPs) or in method local storage. Unlike in conventional programming, the programmer does not have to worry about controlling the exact sequence of events - all s/he needs to concentrate on is the transformations that apply to the data to convert the original inputs to the desired output.

More importantly, the ways data is viewed in FBP vs. conventional programming (as well as many FBP-inspired systems) are completely different: in FBP, data is managed in packets (IPs), which have a well-defined lifetime, from creation to destruction, and can only be owned by one process at a time, or be in transit between processes - just like real-life objects. In conventional programming, data does not have a well-defined lifetime or clear ownership, as the data is confused with its storage medium. This, in combination with the single-threaded restriction, leads to many of the weird bugs that bedevil today's complex systems, as it is so sensitive to the exact timing of events that a minor timing error can have catastrophic results!

FBP supports data processing applications (business or scientific), typically long-running and high volume, and, as we have shown, involves a way of thinking (the new "paradigm") that is fundamentally different from that of conventional programming. This paradigm is actually more similar to engineering than to conventional programming, and, not surprisingly, involves a period of what might be called "apprenticeship", during which the practitioner is getting comfortable using its concepts. Conventional programming, by comparison, is as if you gave an engineer a bunch of blueprints and some girders, and told him or her to go build a bridge! It's not surprising that so many systems built using conventional technologies in recent years have suffered from cost overruns, logic glitches, etc., etc., and the problem is getting worse!

While data-oriented models have been used for application design for a number of years, up until now there was no easy way of converting these designs into running programs. Programmers could indeed design systems using data-oriented thinking, but then had to laboriously convert these designs into procedural code. In comparison, FBP provides a seamless transition from design to implementation, and our experience with it shows that it results in more maintainable and in fact better performing systems. It also facilitates communication between designers, programmers, maintenance staff and users. One large program written using an early ("green thread") implementation of FBP had been running in production for almost 40 years (as of the beginning of 2014), processing millions of transactions a night, while undergoing continuous maintenance during all that time, often by people who weren't even born when it was written!

While an FBP process is a "black box" component with its own internal environment and control thread, a NoFlo process is essentially a cloud of callbacks linked by instance variables. By comparison, the FBP mental model of a single process is much simpler - indeed, very similar to that of conventional programming - as basically each process has a single high-level method, which can then call subroutines in the normal way, as each process has its own independent call stack. There is then no confusion between the method's local storage and the process object's instance variables. Henri Bergius was able to simulate many FBP-inspired characteristics on the Node.js infrastructure, but some rather basic, and necessary, FBP techniques have no obvious counterpart in NoFlo. For instance, basic FBP business functions such as "Collate" require a process to be specific about which port it wants to receive from, and to be able to suspend until data arrives at that port - this function, or something similar, is being introduced gradually into NoFlo, but it logically requires a related architectural concept, missing from NoFlo, called "back pressure", where an upstream process will be suspended if the connection it feeds into becomes full. Back pressure is the only way I am aware of that allows "infinite" amounts of data to be processed using finite resources! The NoFlo team tells me that they have been making changes to NoFlo to bring it closer to FBP, so we shall see what the future brings.

While NoFlo is appealing because of its ability to support both client- and server-side processing, thanks to Node.js, and because of JavaScript's close integration with HTML, it is still tied closely to "von Neumann thinking". All existing implementations of FBP on my GitHub directory can take advantage of multiple cores, with the exception of JSFBP. Because of JavaScript's restriction to a single core, neither NoFlo nor JSFBP support CPU-intensive applications, which are in fact well supported by JavaFBP, C#FBP and C++FBP using Boost. It should be pointed out that the first FBP implementation used "green threads" with multiple stacks, so you could use the same programming style as we do in today's FBP implementations. The underlying OS also supported asynchronous I/O, so, although we only had a single processor, performance was excellent - in fact run time was often better than with conventional programming, because, if a single process was suspended because of I/O, the whole job step did not have to be suspended, as is the case in conventional programming.

Recently two colleagues and I have been working on an FBP implementation, written in JavaScript, called JSFBP, based on node-fibers, a package developed by Marcel Laverdet, and therefore in turn on Node.js. This is actually a "green thread" implementation, as were the first two FBP implementations running on IBM mainframes. However, "green threads" do not support multiple cores, a limitation shared by NoFlo, and I am told that JSFBP's dependence on "node-fibers" will likely prevent it from gaining wide acceptance. However, this implementation is very much an FBP implementation, so I have made it available via GitHub, as there may be a role for it in the future.

In conclusion, I thought I would compare one commonly used component in classical FBP against the same function written in NoFlo. The result is in "Concat" Component.

For those wishing to gain experience with FBP, there is no substitute for reading the book (Flow-based Programming, 2nd edition), and then starting to use one of the FBP implementations such as JavaFBP, C#FBP or JSFBP, or even the C++/Boost implementation currently under development, as described on the FBP web site. JavaFBP has the advantage of being closely integrated with a powerful diagramming tool, called DrawFBP, although DrawFBP can support any data flow language - and indeed can support high-level, language-independent, design as well.

For the time being, users wishing to work with FBP can code up networks using JavaFBP, C#FBP or CppFBP by hand, or JSFBP. Alternatively, they can use the DrawFBP drawing tool, written using Java Swing, which is also quite general, and can in fact generate networks for JavaFBP and C#FBP, as well as the .fbp notation used by NoFlo and CppFBP, plus NoFlo JSON networks. If JavaFBP is chosen, DrawFBP can load any chosen components, display its description and ports, and even check whether all required ports are connected.

FBP and OO

For a discussion of the differences and similarities between FBP and OO, see Comparison between FBP and Object-Oriented Programming (Chapter 25 of the 2nd edition).

Home